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(57) ABSTRACT 

A system and method for estimating statistics concerning 
system metrics to provide for the accurate and efficient 
monitoring of one or more computer systems. The system 
preferably comprises a distributed computing environment, 
i.e., an enterprise, which comprises a plurality of intercon- 
nected computer systems. At least one of the computer 
systems is an agent computer system which includes agent 
software and/or system software for the collection of data 
relating to one or more metrics, i.e., measurements of system 
resources. Metric data is continually collected over the 
course of a measurement interval, regularly placed into a 
regz c,r " n ^ m^tWr-c anH then r>erir» / '"" ,, i» Tiriuv*^" 
registry inairecuy. oampung-reiated uncertainty and inac- 
curacy arise from two primary sources: the unsampled 
residual segments of seen (i.e., sampled and therefore 
known) events, and unseen (i.e., unsampled and therefore 
unknown) events. The total unsampled utilization and the 
total unseen utilization are accurately estimated according to 
the properties of one or more process service time distribu- 
tions. The total unseen utilization is also estimated with an 
iterative method using gradations of the sample interval. The 
length distribution of the unseen processes is determined 
with the same iterative method. 
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ENTERPRISE MANAGEMENT SYSTEM AND mines the usefulness of the performance model for system 

METHOD WHICH INCLUDES STATISTICAL capacity planning. The degree of reliability also determines 

RECREATION OF SYSTEM RESOURCE the usefulness of the performance statistics presented to 

USAGE FOR MORE ACCURATE end-users by performance tools. 

MONITORING, PREDICTION, AND 5 Sensitivity to sampling frequency varies among data 

PERFORMANCE WORKLOAD types. Performance data can be classified into three catego- 

CHARACTERIZATTON ries: cumulative, transient, and constant Cumulative data is 

data that accumulates over time. For example, a system CPU 
time counter may collect the total number of seconds that a 

BACKGROUND OF THE INVENTION io processor has spent in system state since system boot. With 

1 Field of the Invention transient data, old data is replaced by new data. For example, 

THe present invention relates to the collection, analysis, the amount of free memory is a tra^ienl ' ^ which is 

and management of system resource data in distributed or updated periodically to reflect the amount of memoir not v 

enterprise computer systems, and particularly to the more use However values such as the mean vanan* and 

acetate monitoring of the aate of a computer system and 15 standard delation can be computed based on a sampling 

accurate moniionng 01 iiic m ^ ' history of the transient metric. The third type of performance 

more accurate prediction of system performance. d^ZoaSLznl data, does not change over the measurement 

2. Description of the Related Art interval or lifetime of the event. For example, system 

The data processing resources of business organizations configuration information, process ID, and process start time 

are increasingly taking the form of a distributed computing 20 are generaUy constant values. 

environment in which data and processing are dispersed ^ ^ rformance metri cs are 

over a network comprising many interconnected me most to variations m me sample interval and are 

heterogeneous, geographically remote computers Such a ^ ^ to ^ characterized by uncertamly . 

computing environment is commonly referred W as an ^ m& t lingj ^ state 

enterprise computing environment, or simply an enterprise. M ^ However, cumulative data may 

Managers of the enterprise often employ software packages * eQt ^ espe _ 

known as enterprise management systems to monitor, to ^ rf ^ a ^ aearly> 

analyze, and manage the resources of the enterpriser- > £ q£ data caused b ^ t sampling can 

prisemanagementsystemsmayprovideforthecollecfaonof ^- ^ bkms m pcrformancc modc]ing . Therefore, 

measurements, or metrics, concerning the resources of mm- M P ^ re me essence of ^ 

vidual systems. For example, an enterpnse managemen 6 a sufficient degree of certainty, 

system might include a software agent on an mdividual ^ hel f t ^ & ^ not a 

computer system for me momtonng^ of V^™™" otSbeca^se of^e heavy resource usage involved. 

such as CPU usage or disk access. U.S. Pat. No. 5,655,081 f j ° 
disclosesoneexampleo^^^ 

^^SS^^^^T!^ efficienUyreflJsystemresouroeusageatalowersampling 

system resource utilization are useful for assuring the sat- frequency. 

isfactory performance of one or more computer systems in SUMMARY OF THE INVENTION 

the enterprise. Examples of such analysis and modeling 40 

tools are the "ANALYZE" and "PREDICT" components of The present invention is directed to a system and method 

"BEST/1 FOR DISTRIBUTED SYSTEMS" available from that meet the needs for more accurate and efficient moni- 

BMC Software, Inc. Such tools usually require the input of toring and prediction of computer system performance. In 

periodic measurements of the usage of resources such as the preferred embodiment, the system and method are used 

central processing units (CPUs), memory, hard disks, net- 45 in a distributed computing environment, i.e., an enterprise, 

work bandwidth, and the like. To ensure accurate analysis The enterprise comprises a plurality of computer systems, or 

and modeling, therefore, the collection of accurate perfor- nodes, which are interconnected through a network. At least 

mance data is critical. one of the computer systems is a monitor computer system 

Many modern operating systems, including "WINDOWS from which a user may monitor the nodes of the enterpnse 

NT* and UNIX, arable of recording and maintaining an 50 At least one of the computer systems is an agent cornputer 

enormous amount of performance data and other data con- system. An agent computer system mcludes agent software 

cerningthestateof the hardware and software of a computer and/or system software that permits the coUecUon of data 

system Such data collection is a key step for any system rclatmg to one or more metn^ 

performance analysis and prediction. The operating system resources on the agent computer sys emjn Mjc piefercd 

or system software collects raw performance data, usually at 55 embodiment, metnc data is continually collected ^ at a high 

a hi^ frequency, stor^ over * e of a measurement mterval and 

men%^caUyupdat«medata.Inm^ Placed into a reptry of metrics. THe me ^to^M 

is noiused duecd^but is instead sampled from the registry. directly but rather is routinely sampled at a constant sample 

Sampling at a high frequency, however, can consume sub- interval from the registry of metrics. Because sampling uses 

sZalsW-resourclsuchasCPU cycles, storage space, 60 substantial 

and I/O bandwidth. Therefore, it is impractical to sample the formed at a lesser frequency than the frequency of collec- 

data at a high frequency. On the other hand, infrequent tion. 

sampling cannot capture the complete system state: for Sampled metric data can be used to build performance 

example, significant short-lived events and/or processes can models for analysis and capacity planning. However, less 

be missed altogether. Infrequent sampling may therefore 65 frequent sampling can result in inaccurate models and data 

distort a model of a system's performance. The degree to uncertainty, especially regarding the duration of eveotsor 

which the sampled data reliably reflects the raw data deter- processes and the number of events or processes. The 
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present invention is directed to reducing said uncertainty. 
Uncertainty arises from two primary sources: the unsampled 
segment of a seen process or event, and the unseen process 
or event A seen process is a process that is sampled at least 
once; therefore, its existence and starting time are known. 5 
However, the residual time or utilization between the last 
sampling of the process or event and the death of the process 
or the termination of the event is unsampled and unknown. 
An unseen process is shorter than the sample interval and is 
not sampled at all, and therefore its entire utilization is 10 
unknown. Nevertheless, the total unsampled (i.e., residual) 
utilization and the total unseen utilization can be estimated 
with the system and method of the present invention. 

In detennining the total unsampled utilization, a quantity 
of process service time distributions are determined, and is 
each of the seen processes are assigned respective process 
service time distributions. For each distribution, a mean 
residual time is calculated using equations provided by the 
system and method. The total unsampled utilization is the 
sum of the mean residual time multiplied by the number of 20 
seen processes for each distribution, all divided by the 
measurement interval. 

In determining the total unseen utilization, first the total 
captured utilization is determined to be the sum of the 
sampled utilizations of all seen processes over the measure- 25 
ment interval. Next the total measured utilization, or the 
"actual" utilization over the measurement interval, is 
obtained from the system software or monitoring software. 
The difference between the total measured utilization and 
the total captured utilization is the uncertainty. Because the 
uncertainty is due to either unsampled segments or unseen 
events, the total unseen utilization is calculated to be the 
uncertainty (the total measured utilization minus the total 
captured utilization) minus the total unsampled utilization. ^ 

When the total measured utilization is not available, the 
total unseen utilization is estimated with an iterative bucket 
method. A matrix of buckets are created, wherein each row 
corresponds to the sample interval and each bucket to a 
gradation of the sample interval. Each process is placed into ^ 
the appropriate bucket according to how many times it was 
sampled and when in the sample interval it began. Starting 
with the bucket with the longest processes) and working 
iteratively back through the other buckets, the number of 
unseen processes are estimated for each length gradation of ^ 
the sample interval. The iterative bucket method is also used 
to determine a length distribution of unseen processes. 

In response to the determination of utilizations described 
above, the system and method are able to use this informa- 
tion in modeling and/or analyzing the enterprise. In various 5Q 
embodiments, the modeling and/or analyzing may further 
comprise one of more of the following: displaying the 
determinations to a user, predicting future performance, 
graphing a performance prediction, generating reports, ask- 
ing a user for further data, permitting a user to modify a 55 
model of the enterprise, and altering a configuration of the 
enterprise in response to the determinations. 

BRIEF DESCRIPTION OF THE DRAWINGS 

A better understanding of the present invention can be 6£ 
obtained when the following detailed description of the 
preferred embodiment is considered in conjunction with the 
following drawings, in which: 

FIG. 1 is a network diagram of an illustrative enterprise 
computing environment; 6! 

FIG. 2 is an illustration of a typical computer system with 
computer software programs; 
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FIG. 3 is a block diagram illustrating an overview of the 
enterprise management system according to the preferred 
embodiment of the present invention; 

FIG. 4 is a block diagram illustrating an overview of the 
Monitor component of the enterprise management system 
according to the preferred embodiment of the present inven- 
tion; 

FIG. 5 is a block diagram illustrating an overview of the 
Agent component of the enterprise management system 
according to the preferred embodiment of the present inven- 
tion; 

FIG. 6 is a block diagram illustrating an overview of the 
Analyze component of the enterprise management system 
according to the preferred embodiment of the present inven- 
tion; 

FIG. 7 is a block diagram illustrating an overview of the 
Predict component of the enterprise management system 
according to the preferred embodiment of the present inven- 
tion; 

FIG. 8 is a flowchart illustrating an overview of the 
collection and sampling of metric data; 

FIG. 9 is a diagram illustrating an unsampled segment of 
a seen event; 

FIG. 10 is a diagram illustrating an unseen event; 

FIG. 11 is a flowchart illustrating an overview of the 
estimation of metric data statistics; 

FIG. 12 is a flowchart illustrating the determination of the 
total uncaptured utilization; 

FIG. 13 is a flowchart further illustrating the determina- 
tion of the total uncaptured utilization; 

FIG. 14 is a flowchart illustrating the determination of the 
portion of the total uncaptured utilization for an exponential 
distribution; 

FIG. 15 is a flowchart illustrating the determination of the 
portion of the total uncaptured utilization for a uniform 
distribution; 

FIG. 16 is a flowchart illustrating the determination of the 
portion of the total uncaptured utilization for an unknown 
distribution; 

FIG. 17 is a flowchart illustrating an alternative method of 
the determination of the portion of the total uncaptured 
utilization for an unknown distribution; 

FIG. 18 is a flowchart illustrating the determination of the 
total unseen utilization; 

FIG. 19 illustrates a matrix of buckets used in the esti- 
mation of the total unseen utilization; 

FIG. 20 illustrates a specific example of the estimation of 
the total unseen utilization with buckets; 

FIG. 21 is a flowchart illustrating the iterative bucket 
method of estimating the total unseen utilization; 

FIGS. 22 and 23 are equations which are used to generate 
a length distribution of the unseen processes. 

DETAILED DESCRIPTION OF THE 
PREFERRED EMBODIMENT 

> U.S. Pat. No. 5,655,081 tided "System for Monitoring and 
Managing Computer Resources and Applications Across a 
Distributed Environment Using an Intelligent. Autonomous 
Agent Architecture" is hereby incorporated by reference as 
though fully and completely set forth herein. 

> U.S. Pat. No. 5,761,091 titled "Method and System for 
Reducing the Errors in the Measurements of Resource 
Usage in Computer System Processes and Analyzing Pro- 
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cess Data with Subsystem Data'* is hereby incorporated by programs 160 and a typical computer system 150 Each 

reference as though fully and completely set forth herein. computer system 150 typically comprises components such 
&u , t j ,asa CPU 152, with an associated memory media. The 

FIG. 1 illustrates an enterprise computing environment ^ media sUm program instructions of the computer 

according to one embodiment of the present invention. An pr0 p rams 160 wher ein the program instructions are execut- 

enterprise 100 comprises a plurality of computer systems Jble by the CPU 152. The memory media preferably com- 

which are interconnected through one or more networks. a system mcm ory such as RAM and/or a nonvolatile 

Although one particular embodiment is shown in FIG. 1, the mcmory suc h as a hard disk. The computer system 150 

enterprise 100 may comprise a variety of heterogeneous farther comprises a display device such as a monitor 154, an 

computer systems and networks which are interconnected in alphanumeric input device such as a keyboard 156, and 
a variety of ways and which run a variety of software 10 optional [ y a directional input device such as a mouse 158. 

applications. The computer system 150 is operable to execute computer 

One or more local area networks (LANs) 104 may be programs 160. 
included in the enterprise 100. ALAN 104 is a network that When the computer programs are executed on one or 
spans a relatively small area. Typically, a LAN 104 is more computer systems 150, an enterprise management 
confined to a single building or group of buildings. Each svs tem 180 is operable to monitor, analyze, and manage the 
node (i.e., individual computer system or device) on a LAN computer programs, processes, and resources of the enter- 
104 preferably has its own CPU with which it executes prise jqq Each computer system 150 in the enterprise 100 
programs, and each node is also able to access data and executes or runs a plurality of software applications or 
devices anywhere on the LAN 104. The LAN 104 thus ^ processes. Each software application or process consumes a 
allows many users to share devices (e.g., printers) as well as p or tion of the resources of a computer system and/or net- 
data stored on file servers. The LAN 104 may be charac- work: for cxamp i e , CPU time, system memory such as 
terized by any of a variety of types of topology (i.e., the ram, nonvolatile memory such as a hard disk, network 
geometric arrangement of devices on the network), of pro- bandwidth, and input/output (I/O). The enterprise manage- 
tocols (i.e., the rules and encoding specifications for sending ment sys t em 180 permits users to monitor, analyze, and 
data, and whether the network uses a peer-to-peer or client/ manage resource usage on heterogeneous computer systems 
server architecture), and of media. (e.g., twisted-pair wire, 150 across trje enterprise 100. 

coaxial cables, fiber optic cables, radio waves). As illus- FIG. 3 shows an overview of the enterprise management 
trated in FIG. 1, the enterprise 100 includes one LAN 104. S y Ste m 180. The enterprise management system 180 
However, in alternate embodiments the enterprise 100 may ^ includes at least one console node 400 and at least one agent 
include a plurality of LANs 104 which are coupled to one node 300 bm it may mcluc j e a plurality of console nodes 400 
another through a wide area network (WAN) 102. A WAN an d/ or a plurality of agent nodes 300. In general, an agent 
102 is a network that spans a relatively large geographical nodc ^ execules software to collect metric data on its 
area. computer system 150, and a console node 400 executes 
Each LAN 104 comprises a plurality of interconnected 3S software to monitor, analyze, and manage the collected 
computer systems and optionally one or more other devices: metrics from one or more agent nodes 300. A metric is a 
for example, one or more workstations U0a, one or more measurement of a particular system resource. For example, 
personal computers 112a, one or more laptop or notebook m ^ preferred embodiment, the enterprise management 
computer systems 114, one or more server computer systems S y S t e m 180 collects metrics such as CPU, disk I/O, file 
116, and one or more network printers 118. As illustrated in ^ svst em usage, database usage, threads, processes, kernel, 
FIG. 1, the LAN 104 comprises one of each of computer registry, logical volumes, and paging. Each computer system 
systems 110a, 112a, 114, and 116, and one printer 118. The 150 m me enterprise 100 may comprise a console node 400, 
LAN 104 may be coupled to other computer systems and/or an agpnt ^de 300, or both a console node 400 and an agent 
other devices and/or other LANs 104 through a WAN 102. noG » c 300 in the preferred embodiment, server computer 
One or more mainframe computer systems 120 may 45 systems include agent nodes 300, and other computer sys- 
optionally be coupled to the enterprise 100. As shown in terns may also comprise agent nodes 300 as desired, e.g., file 
FIG. 1, the mainframe 120 is coupled to the enterprise 100 servers, print servers, e-mail servers, and internet servers, 
through the WAN 102, but alternatively one or more main- The console node 400 and agent node 300 are characterized 
frames 120 may be coupled to the enterprise 100 through by an end-by-end relationship: a single console node 400 
one or more LANs 104. As shown, the mainframe 120 is 50 may be linked to a single agent node 300, or a single console 
coupled to a storage device or file server 124 and mainframe no de 400 may be linked to a plurality of agent nodes 300, or 
terminals 122a, 122b, and 122c. The mainframe terminals a plurality of console nodes 400 may be linked to a single 
122a, 122/?, and 122c access data stored in the storage agent node 300, or a plurality of console nodes 400 may be 
device or file server 124 coupled to or comprised in the linked to a plurality of agent nodes 300. 
mainframe computer system 120. 55 i D the preferred embodiment, the console node 400 com- 
The enterprise 100 may also comprise one or more prises four user-visible components: a Monitor component 
computer systems which are connected to the enterprise 100 402, a Collect graphical user interface (GUI) 404, an Ana- 
through the WAN 102: as illustrated, a workstation llOfc and l yZ e component 406, and a Predict component 408. In one 
a personal computer 1126. In other words, the enterprise 100 embodiment, all four components 402, 404, 406, and 408 of 
may optionally include one or more computer systems 60 the console node 400 are part of the "BEST/1 FOR DIS- 
which are not coupled to the enterprise 100 through a LAN TRIBUTED SYSTEMS" software package or the 
104. For example, the enterprise 100 may include computer "PATROL" software package, all available from BMC 
systems which are geographically remote and connected to Software, Inc. The agent node 300 comprises an Agent 302, 
the enterprise 100 through the Internet. one or more data collectors 304, Universal Data Repository 
T*e present invention preferably comprises computer 65 (UDR) history files 210a, and Universal Data Format (UDF) 
programs 160 stored on or accessible to each computer history files 212a. In alternate embodiments .the agent node 
system in the enterprise 100. FIG. 2 illustrates computer 300 includes either of UDR 210a or UDF 212a, but not both. 
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The Monitor component 402 allows a user to monitor, in Whenever the Agent 302 generates an alarm to indicate a 
real-time, data that is being collected by an Agent 302 and troublesome status on the agent node 300, the Manager 
being sent to the Monitor 402. The Collect GUI 404 is Daemon 430 intercepts the alarm and feeds the alarm to one 
employed to schedule data collection on an agent node 302. or more Monitor Consoles, such as 420a and 420b. 
The Analyze component 406 takes historical data from a 5 Typically, an alarm is a notification that a particular thresh- 
UDR 210a and/or UDF 212a to create a model of the old has been exceeded on a monitored process or subsystem 
enterprise 100. The Predict component 408 takes the model on an agent node 300. The Manager Daemon 430 is capable 
from the Analyze component 406 and allows a user to alter of receiving alarms from a plurality of Agents 302. A 
the model by specifying hypothetical changes to the enter- Manager Daemon 430 is preferably always running on each 
Prise 100. Analyze 406 and Predict 408 can create output in 10 console node 400 so that alarms can be captured even when 
a format which can be understood and displayed by a the Monitor Consoles 420a and 420b are offline. 
Visualizer toe' 4* O.in ihe preferred embodiment, Visualizer Each of the Monitor Consoles 420a a*hd^20f> is able to 
410 is the "BEST/l-ViSUALIZER" available from BMC issue one or more policies. A policy defines a disparate set 
Software, Inc. In one embodiment, Visualizer 410 is also of metrics to be collected on one or more agent nodes 300. 
part of the console node 400. 15 In other words, a policy allows a Monitor Console 420a or 

The Agent 302 controls data collect on a particular 4206 to monitor one or more metrics on one or more agent 
computer system and reports the data in rea! time to one or nodes 300 simultaneously. For example, a user could build 
more Monitors 402. In the preferred embodiment, *be Agent and deploy a policy that restricts web browser access on a 
302 is the part of the "BEST/1 FOR DISTRIBUTED SYS- plurality of agent nodes 300 with the following set of 
TEMS" software package available from BMC Software. i0 interrelated conditions: "IF more than 80% of server CPU is 
Inc The data collectors 304 collect data from various . required by critical production applications, AND the run 
processes and subsystems of the agent node 300. The Agent . quece length is greater than six, AND active time on 
302 sends real-time data to the UDR 210a, which is a production disks exceeds 40%." Policies are registered with 
database of historical data in a particular data format. The the PoJicy. Registration Queue 440, from which they are 
UDF 212a is similar to the UDR 210a, but the UDF 212a 25 disseminated .as the appropriate Agents 302. An Agent 302 
uses an alternative data format and is written directly by the can execute z plurality of policies simultaneously, 
data collectors 304. ^ . FI G . 5 shows ar> overview of the Agent component 302 of 

FIG 4 shows an overview of the Monitor component 402 the agent node 30^of N *e enterprise management system 
of the console node 400 of the enterprise management 180. In the preferred euibtx^ment, every agent node 300 has 
system 180 The Monitor 402 comprises a Manager Daemon 30 one Agent 302. The Monger Console 420c is another 
430 one or more Monitor Consoles (as illustrated, 420a and instance of the Monitor Consols illustrated in FIG. 4 with 
420b), and a Policy Registration Queue 440. Although two reference numbers 420a and 420e^ 
Monitor Consoles 420c and 4206 are shown in FIG. 4, the When the user desires to start anient 302 and begin 
present invention contemplates that one or more Monitor collecting data on a particular agent Kfe 300, the user 
Consoles may be executing on any of one or more console 35 operates the Monitor Console 420c to isse^an ^ent start 
nodes 400 '•»• request through a Service Daemon 2026. P^ferat^v the 

In the preferred embodiment, the Monitor Consoles 420a Service Daemon 2026 is always executing on the agent 
and 4206 use a graphical user interface (GUI) for user input 300 in order to mterc^t messages from one or more Momtor 
and information %lay. Preferably, the Monitor Consoles Consoles 420c even when me Agent 302 is ^offline n the 
420a and 4206 are capable of sending several different types 40 preferred embodiment, the Semce Daemon 2026 is largely 
of requests to an Agent 302, including: alert requests, update invisible to the user. The Service Daemon 2026 also inter, 
requests, graph requests, and drilldown requests. An alert cepts agent version queries from the Monitor Console 420c. 
request sonifies one or more thresholds to be checked on a An agent version query is a request for the current version 
routine basis by the Agent 302 to detect a problem on the number of the piece of software that comprises the Agent 
agent node 300 For example, an alert request might ask the 45 302. As described above the Monitor Console 420c is able 
aLuI 302 to report to the Monitor Console 420a whenever to send alert requests, update requ^graph requests, and 
uLe of a particular software process exceeds a particular drilldown requests to the Agent 302. The Monitor Console 
threshold relative to overall CPU usage on the agent node 420c may also send collection requests, which are requests 
300.Mupoatere q uestisarequestformestatusoftheAgent for the Agent 302 to begin coUecung particular metrics or 
302 For example, the requested status information might 50 metric groups on the agent node 300. 
include the version number of the Agent 302 or the presence When the Agent 302 receives a collect request from itne 
of any alarms in the Agent 302. A graph request is a request Monitor Console 420c through the Service Daemon 2026, 
to receive graph data, i.e., data on a metric as routinely the Agent 302 initiates the collection through the CoUect 
collected by the Agent 302, and to receive the data in real Registry Queue (CRQ) 340. The Agent 302 uses the Collect 
time ie whenever it becomes available from the present 55 Registry Queue 340 to control and schedule data colkction 
time'onward. By obtaining and displaying graph data, the By helping the Agent 302 know how many collectors 304 
Monitor Console 420a enables the rapid identification and are running and whether the collectors 304 arc each the right 
communication of potential application and system perfor- type, the CoUect Registry Queue 340 prevents tedbndant 
mance problems. Preferably, the Monitor Console 420a collection. Each data collector 310, 312, 314, 316, 318, and 
displays graph data in a graphical format. A drilldown 60 320 is designed to gather one or more metrics for the 
request is a request to receive drilldown data, i.e., data on an operating system and/or one or more subsystems. The 
entire metric group (a set of metrics) as collected by the present invention contemplates a variety of data collectors 
Agent 302. By obtaining and displaying drilldown data, the 304, but for illustrative purposes, the following are shown: 
Monitor Console 420a provides the ability to focus, in system data collector 310 (which collects data from the 
real-time, on a specific set of processes, sessions, or users. 65 operating system), ARM data collector 312 (which collects 
Preferably, the Monitor Console 420a displays drilldown data from ARMed applications 324), UMX data coUector 
data in a tabular format. 314 (which collects data from user scripts/programs 326), 
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Oracle data collector 316 (which collects data from an of levels of varying granularity by sending data at each 
"ORACLE" database management system), Informix data successive level through an intelligent summarization pro- 
collector 318 (which collects data from an "INFORMIX" cess according to the present invention. Historical data can 
database management system), and Sybase data collector also be stored in a Central Repository 440 on the console 
320 (which collects data from a "SYBASE" database man- 5 node 400. A Service Daemon 202a controls the data transfer 
agement system). Each of the collectors 310, 312, 314, 316, from the Remote Repository 360 to the Central Repository 
318, and 320 has an associated input queue 322a, 3226, 440. In the preferred embodiment, the Central Repository 
322c, 322d, 322e, and 322/, respectively. The input queues 440 comprises a UDR 210rf. 

322a, 3226, 322c, 322d, 322e, and 322/ store the requested FIG. 6 illustrates an overview of the Analyze component 
metric groups and associated collection intervals for each 10 406 of the console node 400 of the enterprise management 
collector 304. Although a collector 304 typically supports system 180. In the preferred embodiment, Analyze 406 
multiple metric groups, the collector 304 only collects those comprises the "ANALYZE" portion of the "BEST/1 FOR 
metric groups that are requested. After metric data is DISTRIBUTED SYSTEMS" software package available 
collected, the data is transferred to a Metric Repository 350. from BMC Software, Inc. Essentially, Analyze 406 takes the 
The Metric Repository 350 sits between the Agent 302 and 15 data collected by one or more Agents. 302 and creates a 
the collectors 304 and provides fast interprocess communi- model of one or more computer systems and the processes 
cation between the Agent process. 302 and the collector that run on those computer systems. In the preferred 
processes 304. embodiment, Analyze 106 can model multi-vendor 

Metric data from the Metric Repository 350 is efficiently environments, system memory, multiple processors, disk 
copied into the Metric Repository Pool 352, where the data 20 drives, logical volumes, RAID devices, load balancing, 
is cached by metric group, instance, and collection rate. The ASCII and X terminals, local and remote file servers, 
Metric Repository Pool 352 is located in the memory space independent and dependent transactions, client/server 
of the Agent 302 and is invisible to everything other than the workloads, private and shared memory/transaction, CPU 
Agent 302 By storing collected data for the metric groups priority scheduling, networks of different types, and 
in a single Metric Repository Pool 352 for each Agent 302 25 "ORACLE", "SYBASE", and "INFORMIX" database envi- 
and agent node 300, the enterprise management system 180 ronments. In the preferred embodiment, Analyze 406 takes 
prevents redundant collection: whether one Monitor Con- as input a domain file 466 which identifies the agent nodes 
sole 420c or a plurality of Monitor Consoles such as 420a 300 on the network and the relationship between them. As 
through 420c request data collection for a particular metric shown in FIG. 6, Analyze 406 also takes as input a data 
group! the data is only collected once. 30 repository in either UDF 212c or UDR 210c format, wherein 

In the preferred embodiment, the Collect Registry Queue the data repository 212c or 210c is a set of metric groups 
340, Metric Repository 350, Metric Repository Pool 352, collected from one or more agent nodes 300. 
input queues 322a, 322/?, 322c, 322a", 322e, and 322f, and The Analyze user then can either use a default workload 
Universal Data Repository (UDR) history files 210a, 2106, specification (.an) 464 or create his or her own either with 
210c and 2HW comprise a data structure called a base queue 35 the supplied graphical user interface (GUI) 460 or with a 
or BASEQ. A BASEQ is a contiguous relocatable heap of standard text editor 461. A workload specification 464 
memory: in other words, the BASEQ provides random include a user name, a process name, and other information, 
allocation of data in a contiguous block of storage. The A workload is a useful grouping of key performance metrics. 
BASEQ provides fast interprocess communication with For example, the user might classify a plurality of Oracle- 
locking synchronization between the consumer of data and 40 related processes as an "Oracle" workload, a plurality of 
the provider of data. The BASEQ can be stored in different other processes as a "payroll" workload, and the remainder 
types of memory, such as volatile memory like RAM or as a "miscellaneous" workload. From this classification data, 
nonvolatile memory like a hard disk. In the preferred the Analyze engine 406 creates an Analyze GUI file 462 
embodiment, the BASEQ is implemented as a base class in which contains a list of processes captured within the 
an object-oriented programming environment. In this 45 analysis interval. The Analyze GUI file 462 is then passed to 
embodiment, specialized variants of the BASEQ are imple- the Analyze GUI 460. 

mented as derived classes which inherit the properties of the Using the Analyze GUI file 462, the domain file 466, and 
base class. For example, UDR 210a, 2106, 210c, and 210d the UDF 212c or UDR 210c data repository, Analyze 406 
are implemented with a derived class which is located on a can create several forms of output. First, Analyze 406 can 
file on disk, while Metric Repository 350 is implemented 50 create a model file 468a. The model file 468a is a model of 
with a derived class which is located in a shared memory the workload data as contained in UDF 212c or UDR 210c 
segment. and as classified by the user through the Analyze GUI 460 

In the preferred embodiment, the enterprise management and/or standard text editor 461. Second, Analyze 406 can 
system 180 provides for the storage of historical metric data create reports 472a, which comprise the results of user- 
aswell as the monitoring of real-time metric data. Therefore, 55 specified queries concerning workload characteristics For 
in addition to passing the metric data to the Monitor Console example, one instance of reports 472a could be a list of the 
420c the Agent may also send the metric data to a Remote top ten workloads sorted by total CPU usage. Third, Analyze 
Repository 360 for storage. The Remote Repository 360 is 406 can create a Visualizer file 470a, wherein the Visualize 
located on the agent node 300, and each agent node 300 may file 470a is a description of the characteristics of the 
have its own Remote Repository 360. The Remote Reposi- 60 enterprise 100 as determined by the collected metrics and 
tory comprises a database in the Universal Data Repository the user input. The Visualizer file 470a can be read and 
(UDR) format 2106 and/or a database in the Universal Data utilized by the Visualizer tool 410. In the preferred 
Format (UDF) format 212b. The UDF 2126 is an alternative embodiment, Visualizer 410 is the "BEST/1 -VISUALIZER" 
data format to the UDR 2106 and is used primarily by older available from BMC Software, Inc. With Visualizer 410, 
ones of the collectors 304. The UDR format 2106 is multi- 65 performance statistics and workloads can be graphed, 
node: it can store data from multiple sources in one place. compared, drilled down, and visually analyzed to pinpoint 
UDR 2106 is also multi-rate: it can store data at a plurality hot spots or trends to assist in resource management, system 
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tuning, and configuration changes. Visualizer 410 preferably remains at an acceptable level. In the various ways set forth 

includes functionality known as MASF (Multivariate Adap- above, Predict 408 thus permits a user to plan for the future 

tive Statistical Filtering).. Using standard deviation by "test driving" both actual and alternative or hypothetical 

techniques, MASF continually interprets performance data configurations of the enterprise. 100. 

and calculates normalcy. MASF graphs are thus used to 5 Like Analyze 406, Predict 408 can generate reports 472i>, 

discover true performance anomalies that deviate from nor- a Visualizer file 470i>, and a model file 4686. The model hie 

mal performance behavior. In addition to creating Visualizer 4686 can be modified and passed back to Predict 408 for 

file 470a and reports 472a, Analyze 406 also generates additional modeling. 

Model files 468a for performance prediction of the system Collecting, Sampling, and Statistically Recreating Metric 
wito an enterprise computmg ^ 10 D ^ rformance measuremeat fc the process of gathe ringdata 
FIG. 7 shows an overview of be ^^^^^1^ concerning the state of the hardware and/or software of a 
of the console node 400 of the enterprise muugn^ compu ter system. In one embodiment, system software 
system 18.0. In the preferred embodiment, Predict 408 ^ ^ 3a4 ^^aUy monitor one or more 
comprises the "BEST/1 -PREDICT" component of the elemeQts of me ^puter system and collect raw metric data 
"BEST/1 FOR DISTRIBUTED SYSTEMS" software pack- 15 {Q system performance, preferably at a high fre- 
age available from BMC Software, Inc. Predict 408 is a quenC y. The metric data is written to a memory and peri- 
planning tool which forecasts the impact of hypothetical 0 dically updated. The memory is preferably a registry of 
changes on elements of the enterprise 100 such as disparate metrics. Often, different metrics are not updated at the same 
hardware, software, applications, and databases. Predict 408 ^ me or m me same interval. However, it is assumed that the 
takes the workload data from a Model File 468c, such as the 2 o raw data in the registry accurately reflects the system state 
Model File 468a generated by Analyze 406, and computes 0 f interest. 

performance statistics such as workload response times, In a preferred embodiment, data in the registry is not used 

utilization, and throughputs at CPUs, disks, networks, and directly. Rather, the data is periodically sampled from the 

other elements of the enterprise computing environment registry of metrics indirectly through the process of second- 

100. Thus, Predict 408 constructs a baseline model from 25 hand sampling. Such second-hand sampling is preferably 

collected data that represents the essence of the system performed less frequently than the frequency at which data 

under management. The user can also operate Predict 408 to jg collected and placed into the registry of metrics. Because 

construct the baseline model from pre-built model second-hand sampling itself uses system resources such as 

components, or from a combination of collected data and jyo, storage space, and CPU time, it is impractical and 

pre-built components. Preferably, Predict 408 uses a graphi- 30 inefficient to sample the registry of metrics at a very high 

cal user interface (GUI) for user input and information frequency: that is, at a frequency nearing the usually high 

display. frequency at which raw data is written to the registry of 

After the baseline model has been constructed, the user metrics. On the other hand, if data is sampled from the 

can- modify the baseline model by specifying configuration registry too infrequently, then a model created with the 

corrections, configuration changes, and/or growth scenarios. 35 second-hand data may not be as accurate as desired. For 

With Predict 408, the user can change one or more attributes example, significant short-lived events and/or processes can 

of any model, creating "what if?" or hypothetical scenarios. be missed altogether if the interval between samples is too 

By using methods, modeling techniques, and statistical large. Infrequent sampling may therefore distort a model of 

formulas taken from queuing theory, Predict 408 accurately a system's performance. The degree to which the sampled 

determines the impact of these workload and configuration 40 data reliably reflects the raw data determines the usefulness 

changes on performance and response time. As one of the of the performance model for system capacity planning. The 

results of "what if?" computation, the changes to the base- degree of reliability also determines the usefulness of the 

line are displayed as unidess, numerical response time performance statistics presented to end-users by perfor- 

values relative to the baseline value of one. In the preferred mance tools. 

embodiment, response times are broken down into four key 45 Sensitivity to sampling frequency varies among data 

components: CPU service time and wait time, I/O service types. Generally, performance data can be classified into 

time and wait time, network service time and wait time, and three categories: cumulative, transient, and constant. Cumu- 

wait time for transactions running on external systems. lative data is data that accumulates over time. For example. 

Using the four key components, Predict 408 also preferably a system CPU time counter may collect the total number of 

calculates other critical performance metrics such as 50 seconds that a processor has spent in system state since 

throughput rates, CPU queue lengths, disk queue lengths, system boot. With transient data, old data is replaced by new 

paging rates, and the amount of memory required to elimi- data. For example, the amount of free memory is a transient 

nate excessive paging. metric which is updated periodically to reflect the amount of 

Predict 408 preferably includes a multivendor hardware memory not in use. However, values such as the mean, 

table 469 wherein the table includes the hardware specifi- 55 variance, and standard deviation can be commuted based on 

cations that Predict 408 uses to calculate the performance of a sampling history of the transient metric. The third type 01 

hypothetical changes to the enterprise 100. Therefore, performance data, constant data, does not change over the 

changes to CPU, memory, I/O, priorities, transaction rates, measurement interval or lifetime of the event For example, 

and other attributes can be evaluated across a plurality of system configuration information, process ID, and process 

heterogeneous computer systems 150. Furthermore, in mod- 60 start time are generally constant values. Of the three data 

eling the configuration and workload changes across mul- types, transient performance metrics are the most sensitive 

tinle systems, Predict 408 automatically calculates interac- to variations in the sample interval and are therefore the 

lion and interference between systems. Predict 408 also most likely to be characterized by uncertainty. For example, 

preferably provides scenario planning, or modeling incre- with infrequent sampling, some state changes may be 

mental growth over time, in order to determine the life 65 missed completely. However, cumulative data may also be 

expectancy of computing resources and the point at which rendered uncertain by infrequent sampling, especially with 

resources should be upgraded to ensure that performance regard to the variance of such a metric. 
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The following table contains a nonexhaustive list of 
examples of major performance metrics and their data types. 
The table also presents guidelines as to how often the 
metrics should preferably be sampled. 



METRIC 



DATA TYPE SAMPLE RATE (SEC) 



Disk queue length 
CPU queue length 
I/O counts 

Number of processes 
Memory in use 
Memory size 
Disk busy time 
In (out) network packets 



Cumulative 

Cumulative 

Cumulative 

Cumulative 

Transient 

Constant 

Transient 

Cumulative 



Number of bytes in a packet Cumulative 



5 to 15 
5to 15 
5 to 15 
5 to 15 
1 to 3 
3600 
1 to 3 
5 to 15 
5 to 15 



10 



15 



Performance data is collected according to a measurement 
structure, wherein the measurement structure comprises a 
measurement interval and sample interval or number of 20 
samples. The measurement interval or collection interval L 
is a continuous time segment during which raw performance 
data is collected. The measurement interval is delineated by 
its beginning time and its ending time. The sample interval 
A is the time between two consecutive samples. In the 25 
preferred embodiment, the sample interval is a constant 
value. The number of samples n is the total number of 
samples taken during the measurement interval. The rela- 
tionship among these three parameters is: 

30 

I«(«-1)A. 

The events being sampled may include, for example, 
process lifetimes, process types, or disk access times, or any 
other performance metrics that can be monitored. Although 
this description addresses in detail examples such as CPU 35 
utilization, process lifetime, and process type, the system 
and method can be applied to any metric. As used herein, 
"process" refers to an executing program, a task, a thread, or 
any other unit of execution. 

FIG. 8 is a flowchart illustrating an overview of the 40 
collection and sampling of metric data. In step 700 raw 
performance data is collected by system software or data 
collectors 304 at a high frequency. The raw performance 
data relates to one or more processes on one or more 
computer systems or networks. In step 702 the raw data 45 
points arc stored and/or updated in the registry of metrics. As 
shown in step 704, the collecting and updating steps 700 and 
702, respectively, arc performed for as long as the sample 
interval A has not expired. When the sample interval has 
expired, in step 706 the registry of metrics is sampled. The 50 
sampling creates a set of sampled data points. As shown in 
step 708, steps 700, 702, 704, and 706 are performed 
repetitively as long as the measurement interval L has not 
expired. When the measurement interval L has expired, the 
collection and sampling end. 55 

For performance modeling, two measurements are often 
key: the duration of an event (e.g., a process), or its service 
time; and the number of events, which is equivalent to the 
arrival rate times the length of the measurement interval. 
Reducing the uncertainty associated with these two key 60 
measurements is a goal of the system and method set forth 
in this description. If a process record is created at the time 
when the process is created and if the process is sampled at 
least once, then the exact starting time (i.e., birth) of the 
process can usually be obtained. Furthermore, if the ending 65 
time (i.e., death) of the process is also recorded and the 
record is sampled at least once, then the exact length of the 



process (i.e., the process lifetime) can be computed. In most 
systems, however, the ending time record is not kept, and 
therefore the exact ending time and length of the process are 
unknown. Therefore, other methods must be used to esti- 
mate the ending time and the process length. 

Uncertainty arises from two primary sources: the 
unsampled segment of a seen event or process, and the 
unseen, short-lived event or process. FIG. 9 is a diagram 
illustrating an unsampled segment of a seen event. The 
horizontal line designated Time" indicates increasing time 
from left to right. The timeline encompasses all or part of the 
measurement interval L. The vertical lines labeled s (l7) _ 2 
through s (l>0+ 3 indicate samples taken at a constant sample 
interval A. The event or process 610 begins at the point in 
time b ( - and ends at the point in time d,.. The process 610 
begins* after sample s (i0 _ x but before sample s a , so the 
process 610 is not detected at the point in time b ; when it 
begins. However, the process 610 is still executing when 
sample s i7 is taken, so the existence of this process 610 is 
known at that point. In other words, the process 610 is a seen 
process or a known process as soon as the first sample is 
taken. Furthermore, in a preferred embodiment, the starting 
time b, of the process 610 is also determined when the 
process 610 is detected at sample s (/ . After it has first been 
sampled, the process 610 continues executing for an indefi- 
nite period of time, as indicated in FIG. 9 by broken lines, 
wherein the process lifetime may or may not encompass 
additional samplings of the process 610 at regular sample 
intervals. The last sampling of the process 610, and therefore 
the last time the process 610 is seen, is the m^h sample at 
the point in time s^. The present invention contemplates 
that a seen process may be sampled only once, and thus that 
s if^im- m 50016 cases. The process 610 stops executing at the 
point in time d^, after s^. but prior to s Cim>1 . In the preferred 
embodiment, however, no record is kept of the termination 
of the process 610, and so the length of the process 610 after 
s £rTtj is unknown. Therefore, the known, captured, or sampled 
length 612 of the seen process 610 is represented by the 
difference between s^ and b> The unsampled or unknown 
length 614 of the seen process 610 is represented by the 
difference between d ( - and s^. The unsampled segment 614 
is also known as the residual process time. The captured 
utilization is the sampled length 612 divided by the mea- 
surement interval L. 

FIG. 10 is a diagram illustrating an unseen event. Again, 
the horizontal line designated "Time" indicates increasing 
time from left to right and encompasses all or part of the 
measurement interval L. The vertical lines labeled Sq 
through S3 indicate samples taken at a constant sample 
interval A. The event or process 620 begins at the point in 
time b ( and ends at the point in time d,. In this instance, 
however, the process 620 begins and ends within the sample 
interval A and between two samples. Therefore, the process 
620 is unseen and its length is known only to be less than A. 
The unseen length 622 is represented by the difference 
between d, and b,. 

For a computer system or plurality of systems with 
hundreds or thousands of processes starting and ending 
within a measurement interval, the uncertainty adds up 
rapidly and can distort a performance model. However, 
statistical methods according to the present invention can 
provide estimations of the uncertain data, thus recreating the 
lost data and reducing uncertainty. FIG. 11 is a flowchart 
illustrating an overview of the statistical estimation of metric 
data. The difference between the "actual" total utilization 
and the "sampled" total utilization — in other words, the 
uncertainty-— can be distributed both to the unsampled seg- 
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merits of the seen events or processes and to the unseen In step 756 the mean residual time r for the distribution 

events or processes. Accordingly, in step 720 of FIG. 11 the is determined according to the following equation: 

total uncaptured utilization is estimated. U uc represents 

an estimate of the total unsampled utilization of all seen r* 

processes over the measurement interval L. In step 722 the 5 r ~ J 0 r ,(r) ' 

total unseen utilization is estimated. U w represents an 

estimate of the total utilization of all unseen processes for , . A . A . , . t , „ r ,u * «. n*n tu^^u 

the measurement interval L. wherein A 15 ^ mtecvaL Wlth ste P s 750 throu * 

FIG 12 is a flowchart illustrating the determination of the 756, the system and method are applicable to any process 

total uncaptured utilization V uc . In step 738 the measure- 1Q service time distribution. Nevertheless, a discussion of sev- 

ment interval L is determined. The steps thereafter are eral exemplary distributions follows, 

performed for measurements within the interval L. In step FIG. 14 is a flowchart illustrating the determination of the 

740 one or more process service time distributions are portion of the total uncaptured utilization for an exponential 

determined, wherein the quantity of distributions is labeled distribution. In step 760 the process service time distribution 

d. A process service time distribution is a statistical distri- jg determined to be an exponential distribution with service 

bution which determines the duration of one or more pro- 15 rate ^ j a stcp 752 the quantity n^ of seen processes which 

cesses. In step 742 the quantity n^ of seen processes which follow the exponential distribution with service rate K is 

follow each distribution is determined. In other words, in determined. In step 764 the mean residual time r for the 

steps 740 and 742 the seen processes are divided into d exp0 nential distribution is determined according to the fol- 

groups, wherein each group represents processes that are 1q . tion: 

characterized by the same process service time distribution. 20 

In step 744 a mean residual time is determined for each ^ 

process service time distribution j. The mean residual time ? = - - f A + - k" M , 
r- represents the average expected difference d 1 -s^ ( for all 
tne processes which are characterized by the same process 

service time distribution j. In other words, the mean residual 25 wherein A is the sample interval. 

time Fy represents the average unsampled length related to FIG. 15 is a flowchart illustrating the determination of the 

the unsampled segment 614. portion of the total uncaptured utilization for a uniform 

In step 746 the total uncaptured utilization U uc is deter- distribution. In step 780 the process service time distribution 

mined according to the following equation: is determined to be a uniform distribution between zero and 

30 a constant C In step 782 the quantity of seen processes 

d which follow the uniform distribution between zero and C is 

jr rjuri determined. In step 784 the mean residual time F for the 

, . >=» uniform distribution is determined according to the follow- 

^ ing equation: 

35 

wherein d is the number of process service time ^_ mhjC-i, A) 

distributions, Fy is the mean residual time for each distribu- 2 
tion j, n cpi is the number of seen processes for each distri- 
bution j, and Lis the measurement interval. In other words, wne rein t is the average difference between the last sampling 
the total uncaptured utilization U uc is the sum of the products 40 dme and me beginning ^ the ave rage sampled length 
of the mean residual time and the number of seen processes ^ the measuremcQt interval is normalized to 1) for the seen 
for each distribution, all divided by the measurement inter- processes wn ich follow the uniform distribution between 
val. If there is only one process service time distribution, 2cfo and c> and thus w herein O^t^C, and wherein A is the 
however, then the total uncaptured utilization U MC can be samp i e interval. The mean residual time for the uniform 
determined according to a simplified equation: 45 distribut i on depends upon the sampled process time: the 

more time that has already been captured, the less the 
u _~ rn <p expected residual time. 

* = l ' FIG. 16 is a flowchart illustrating the determination of the 

portion of the total uncaptured utilization for an unknown 
wherein F is the mean residual time for the distribution, o cp 50 distribution. In step 800 the process service time distribution 
is the number of seen processes for the distribution, and L is is determined to be an unknown distribution. In step 802 the 
the measurement interval. quantity n^ of seen processes which follow the unknown 

FIG 13 illustrates the general determination of the total distribution is determined. In step 804 the mean residual 
uncaptured utilization V ue for any process service time 55 time F for the unknown distribution is determined according 
distribution. In step 750 the process service time distribution to the following equation: 
is determined. In step 752 the quantity n^ of seen processes 
which follow this distribution is determined. " 

In step 754 the conditional probability G/r) of residual L mflXlu ' 
time R^r, given that the process time X>t, is determined as ^ y^>o 



follows: 



Pit < x s t + r) 



£ 1 



65 wherein n is the total quantity of processes, s a is the first 
wherein t is the last sample time and r is the unsampled sampling time for each process i, s^O for an unseen process, 
segment length. and b . * a beginning time for each process 1. The rationale 



02/13/2004, EAST Version: 1.4.1 



US 6,691,067 Bl 
17 18 

for the equation is that the residual time mirrors s^-b,, the them will be seen. Because the percentage of short-lived, 

process time prior to the first sample time. seen processes can be estimated as described below, the total 

FIG 17 is a flowchart illustrating an alternative method of number of unseen processes can be estimated as well, 

the determination of the portion of the total uncaptured U» iterative method for tf^^J*.*^* 
utilization for an unknown distribution. In step 820 the 5 ««« processes wril funcUo n for any: "¥°l*^**« 

. . . A , . simplicity of computation, however, let the sample interval 

process seme .time ^^^^^ to * " A=l. Application of the iterative method to a sample interval 

unknown distribuhon. In step 822 the quantity n cp o : seen ^ ^ ^ ^ ^ ^ ^ ^ 

processes which follow the unknown distribution is deter- ^ ^ placed ^ a ^ QliXy of buckets . In 

mined. In step 824 the mean residual time r for the unknown a comp^r-impiemented version of this method, a bucket 
distribution is determined according to the following equa- 10 wQuld preferably be a mem0 ry location, and the plurality of 
tion: buckets would preferably be an ordered plurality of memory 

locations such as a matrix, one-dimensional array, linked 
y (5(7 _ list, or other ordered data structure. The quantity of buckets 

tea> is mxn, wherein n is the maximum number of times that any 

?= ^ * 15 process has been sampled or hit, and wherein m is an 

arbitrary multiple of n. Preferably, m should be chosen such 
. r tl 4 . that there will be, on average, at least 20 processes in each 
wherein CP is a set of all seen processes which follow the ^ m buckels are divided eyenly ^ Q ^ ^ 

unknown distribution, s* is the first sampling time for each illustrated by FIG 19 

seen process UCP, and b, is the beginning time for each 2Q A ^ fc ^ ^ Qne of ^ budcets m the tth m ^ 
process ieCP. The rationale for the equation is that the t .. ^ number of ^ ^ process was samp ied or 

residual time mirrors s, r b„ the process time prior to the first ^ m ^ ^ ^ ^ buckcts represen| 
sample time. 

FIG. 18 is a flowchart illustrating the determination of the m 
total unseen utilization. In step 840 a total captured utiliza- M - 
tion U c is determined. The total captured utilization U c is the 

sum of the sampled lengths of all seen processes over the . 
measurement interval L and can be computed as follows: segments of the sample interval A. Aprocess sampled t times 

and starting within the ith segment of the sample interval is 
placed into the bucket labeled with the following value: 



30 



(f - 1)— + i, wherein i = 0, 1 T 



wherein CP is a set of all seen processes, is the last 

samplmgtimeforeachseenpmcesskCP.b^S'ebeginning 35 For example let the maxnnum observed toe for a 
time for each seen process ieCP, and L is the measurement process be e_<l. Because e <A no process «h» 
.~~ . F more than once, and thus n=l. A derivative assumption is 

m iTtfcp 842 a total measured utilization U m is determined. that no process Uves longer than e_; otherwise, a longer- 
He total measured utilization represents the total utilization ^ved process probably would have been seen_ FHX 20 
of all processes of interest, seen and unseen, over the 40 grates thoscxamp le with m bu eke* i ^ ^ <£° ugh 
measurement interval L. Step 842 assumes that universal IW Thus, the last nonempty bucket will be bucket B, 
utSon statistics are avaW from the registry of y^ib^m^^^^i^c^^ 
metrics, system software, or other monitoring software. In W^" 1 -, 1 * l * e ?* e ~^ m ' t ^L l 

step 844 the total unseen utilization U„ is determined Tw^l- &» other words, e^lSw^m. As shown in 
ac^rdtog to the following equation: 45 FKJ. 20, all of the processes m bucket B, will have length 



between 



U^u m -u c -U uc 

In other words, the uncertainty is the difference between 



the "actual" utilization U m and the "sampled" utilization U tf . m 
The uncertainty (i.e., t^-UJ is the sum of the uncaptured 50 
utilization U MC and the unseen utilization U^, so the unseen and e^. 

utilization V us can be determined once the uncaptured Let bucket B ( . have a quantity of processes f,- of length 
utilization U uc and the uncertainty are known. As discussed between 
above, FIG. 12 illustrates how the uncaptured utilization U^. 
can be computed in one embodiment 55 ,■ 

If, however, universal utilization statistics are not „ 
available, then the total measured utilization U m cannot 

easily be determined. Nevertheless, the total unseen utiliza- ^ of umformity D f start time implies 

tion U^, may still be determined according to an iterative ~ processes ^ 

method which is described as follows. The following 60 & 

method is also useful when the processes of interest do not 

represent all the activity on the computer system. If so, it is - 

assumed that the processes are marked in such a way that it 

can be determined which ones are of interest Furthermore, 

it is assumed in all cases that the start time of processes is 65 and e^ will not be seen at all because they began and 
independent of the sample time. Therefore, if there are very terminated too early, and that others of the same length will 
many short-lived processes, then a certain percentage of have been placed in other buckets because they started 
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closer to the sampling time and continued to live afterwards. and 

In fact, for any ct<e mflJC , the number of processes that start 

between a and ' 



/ 



5 

is t^-f^m, and of that quantity, (m-i) are unseen. In 

this way, the number of processes in the remaining buckets 

. . tw ^ n can be calculated by calculating one bucket at a time and 

after a sample is taken and have length between ^ prooeeding to ^ bucket ^ ±c bucket with the aext 

io shortest processes). 

1 In general, to compute the quantity of processes of length 

m between 

and e^ will be approximately as well. Therefore, the total j-i 
number of processes of length between 15 m 

i and 



and is approximately: 



20 



l look at the quantity in the bucket labeled (j-1). Subtract 

t> = j-fi* from this quantity the estimates of the longer processes 

f «« - - 25 (previously calculated) that landed in this bucket. This new 

quantity is f ; _ a . Multiply by m, since all m buckets in the 
sample interval A are equally likely to have the processes of 
which is approximately mf f -. Of course, those processes of m ig length. Of this number, the fraction that is unseen is 
the same length that started early in the collection cycle will 
be unseen. Their number can be estimated as: 



30 



, = r,(l - = r^> 



Therefore, the total number of unseen processes of length 
between 



35 

which is approximately m(l-e max )f i . In the next bucket, j- 1 

B,_ x , all processes have length at least « 

l _zl. and 
m 40 

7 

The initial quantity of processes in this bucket are counted. m 
Then subtract from the initial quantity 



or the estimated quantity of these processes that have length 
greater than 

t 

m 

The remaining processes of whose quantity we will desig- 
nate f^j have length between 

and — . 

m m 

The total number of processes of length between 
i-i 

m 



In this way, the number of processes of a given length less 

50 than the sample interval can be calculated. As shown in FIG. 
21 and as described in detail above, in step 860 create mxn 
buckets. In step 862 place each of the seen processes into the 
appropriate bucket, as described above. In step 864 start at 
the highest-ranked bucket: that is, the bucket with the 

55 longest processes. In step 866 count the number of processes 
in the current bucket (in the first case, the highest-ranked 
bucket). In step 868, as described in detail above, subtract 
from the count the fraction of the processes that were 
previously counted for higher-ranked buckets (when looking 

60 at the highest-ranked bucket, subtract zero processes). In 
step 870, multiply the difference by m, the number of 
buckets per sample interval A. In step 872, estimate the 
fraction of the product of step 870 which are unseen pro- 
cesses. In step 874, decide whether this is the lowest-ranked 

65 bucket. If it is the lowest-ranked bucket, then we stop. If it 
is not the lowest-ranked bucket, then in step 876 look at the 
bucket of the next lower rank and go through the process 
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again, starting at step 866. With this iterative technique, the 
number of unseen processes can be estimated for any 
number of segments of the sample interval A. 

If we do not assume that the longest process is shorter 
than the sample interval A, then buckets representing pro- 
cesses seen only once may actually contain processes of 
length greater than A. However, by looking at buckets 
representing processes that were seen more than once, we 
can iteratively estimate the number of processes in each 
bucket that represent processes of minimal possible length 
by subtracting an estimate of the number of processes in the 
bucket that had longer length. In other words, the technique 
described above also encompasses n>l, where n is the 
maximum number of times that any process is sampled. 



10 



Altering a configuration of the enterprise may comprise, for 
example, reconfiguring a network topology or installing 
additional resources, such as CPUs, software, memory 
resources, or network routers or hubs. 

Although the system and method of the present invention 
have been described in connection with several 
embodiments, the invention is not intended to be limited to 
the specific forms set forth herein, but on the contrary, it is 
intended to cover such alternatives, modifications, and 
equivalents as can be reasonably included within the spirit 
and scope of the invention as defined by the appended 
claims. 

What is claimed is: 

1. A method for monitoring the state of a computer 



The iterative technique described above is useful even 15 system, the method comprising: 



when the total measured utilization U m is known. With this 
technique, the length distribution of unseen processes can be 
determined. The distribution should be proportional to the 
number of unseen processes in each bucket. Let U uj(0 be the 
utilization of unseen processes of length between i 



20 



and e^, and let be the utilization of unseen processes 



of length between 



30 



and 



35 



wherein j=0,l,2, . . . , i-1. The utilization for the unseen 
processes can be distributed as: 



*)+£(m 

A=0 



■(*+!»/* 



40 



45 



50 



wherein j =0,1,2, . . . , i-1. The two equations are illustrated 
in FIGS. 22 and 23, respectively. In other words, the length 
distribution of unseen processes is determined by multiply- 
ing the total unseen utilization U„, by a coefficient, wherein 
the coefficient and is derived from the iterative method. 

In one embodiment, the enterprise is modeled and/or its 
configuration is altered in response to the determination^) 
of utilization described herein. Modeling according to one 55 
embodiment is discussed in detail with reference to FIGS. 6 
and 7. In various embodiments, this modeling may further 
comprise one of more of the following: displaying the 
determination(s) to a user, predicting future performance, 
graphing a performance prediction, generating reports, ask- 
ing a user for further data, and permitting a user to modify 
a model of the enterprise. In one embodiment, Analyze 406 
and/or Predict 408, as discussed in detail with reference to 
FIGS. 6 and 7, implement the modeling, analysis, and/or 
prediction in response to the determination^) of utilization. 65 
In one embodiment, a configuration of the enterprise is 
altered in response to the determinations) of utilization. 



60 



collecting a set of raw data points over a measurement 
interval L, wherein the set of raw data points relates to 
one or more processes on the computer system; 
storing the set of raw data points in a memory; 
sampling the memory repetitively at a sample interval A 
to create a set of sampled data points, wherein pro- 
cesses which are included in the set of sampled data 
points are seen processes and processes which are not 
included in the set of sampled data points are unseen 
processes, and wherein the set of sampled data points 
includes a first sampling time and a last sampling time 
for each seen process; 
statistically estimating a total uncaptured utilization U„ c , 
wherein the total uncaptured utilization is an estimation 
of a total length of unsampled segments for the seen 
processes of the one or more processes over the mea- 
surement interval; 
statistically estimating a total unseen utilization 
wherein the total unseen utilization is an estimation of 
a total length of the unseen processes of the one or more 
processes over the measurement interval. 
2. The method of claim 1, 

wherein the statistically estimating a total uncaptured 

utilization further comprises: 

determining a process service time distribution, 
wherein the process service time distribution esti- 
mates a duration of one or more processes; 

determining a quantity n cp of seen processes which 
follow the process service time distribution; 

determining a mean residual time r for the process 
service time distribution, wherein the mean residual 
time estimates a length of an uncaptured residual 
segment for each seen process; 

determining the total uncaptured utilization accord- 
ing to the following equation: 



U = — 



wherein L is the measurement interval. 
3. The method of claim 2, 

wherein the determining a mean residual time r further 
comprises: 

determining a conditional probability function GJ[r) for 
the process service time distribution, wherein G/r) is 
a conditional probability that a residual time R^r, 
given that a process time X>t, wherein t is the last 
sampling time and r is a difference between the last 
sampling time and a process ending time, and 
wherein G/r) is determined according to the follow- 
ing equation: 
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wherein the determining a mean residual time r further 
Pit < x s t + r) comprises determining the mean residual time r accord- 

ed) = r\R s r) | x > o = — p^ x > t ) ; ing to the following equation: 

determining the mean residual time r according to the 5 £ maxtf), (s u -&,)] 

following equation: v ;- 1 ^ 



P = j*rdG t i 



i=i, 



wherein A is the sample interval. 

4. The method of claim 2, 

wherein the determining a process service time distribu- 15 
tion comprises determining that the process service 
time distribution is an exponential distribution with a 
service rate X; 

wherein the determining a quantity n cp of seen processes 
comprises determining a quantity n CJ? of seen processes 20 
which follow the exponential distribution with the 
service rate X; 

wherein the determining a mean residual time r further 
comprises determining the mean residual time r accord- 
ing to the following equation: 25 

30 

wherein A is the sample interval. 

5. The method of claim 2, 

wherein the determining a process service time distribu- 
tion comprises determining that the process service 
time distribution is a uniform distribution between zero 35 
and a constant C; 

wherein the determining a quantity n cp of seen processes 
comprises determining a quantity n cp of seen processes 
which follow the uniform distribution between zero and 
C; 40 

wherein each process has a beginning time, and wherein 
the set of sampled data points includes the beginning 
time for each seen process; 

wherein the determining a mean residual lime ? further 45 
comprises determining the mean residual time r accord- 
ing to the following equation: 

tmijC-i, A) 

?= 2 50 

wherein t is the average difference between the last 
sampling time and the beginning time for the seen 
processes which follow the uniform distribution 
between zero and C, wherein O^t^C, and wherein A is 55 
the sample interval. 

6. The method of claim 2, 

wherein the determining a process service time distribu- 
tion comprises determining that the process service g() 
time distribution is an unknown distribution; 

wherein the determining a quantity n cp of seen processes 
comprises determining a quantity n cp of seen processes 
which follow the unknown distribution; 

wherein each process has a beginning time, and wherein 65 
the set of sampled data points includes the beginning 
time b; for each seen process; 



wherein n is a total quantity of processes, s a is the first 
sampling time for each process i, and b £ is a beginning 
time for each process i. 

7. The method of claim 2, 

wherein the determining a process service time distribu- 
tion comprises determining that the process service 
time distribution is an unknown distribution; 

wherein the determining a quantity n^ of seen processes 
comprises determining a quantity n cp of seen processes 
which follow the unknown distribution; 

wherein each process has a beginning time, and wherein 
the set of sampled data points includes the beginning 
time b,- for each seen process; 

wherein the determining a mean residual time r further 
comprises determining the mean residual time r accord- 
ing to the following equation: 

r _ iecp 

wherein CP is a set of all seen processes which follow the 
unknown distribution and s^ is the first sampling time 
for each seen process ieCP. 

8. The method of claim 1, 

wherein the statistically estimating a total uncaptured 

utilization U MC further comprises: 

determining a plurality d of process service time 
distributions, wherein each process service time dis- 
tribution j estimates a duration of one or more 
processes, wherein l^j^d; 

for each process service time distribution j, determining 
a quantity n CFf - of seen processes which follow that 
process service time distribution j; 

for each process service time distribution j, determining 
a mean residual time r ; for that process service time 
distribution j, wherein the mean residual time esti- 
mates a length of an uncaptured residual segment for 
each seen process which follows that process service 
time distribution j; 

determining the total uncaptured utilization U MC accord- 
ing to the following equation: 




wherein L is the measurement interval. 
9. The method of claim 8, 

wherein the determining a mean residual time r ; for each 
process service time distribution further comprises: 
determining a conditional probability function G/r) for 
each process service time distribution, wherein G^O 
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is a conditional probability that a residual time R^r, 
given that a process time X>t, wherein t is the last 
sampling time and r is a difference between the last 
sampling time and a process ending time, and 
wherein G/r) is determined according to the follow- 5 
ing equation; 



F\t<x&t + r) 
~~ T\X>t) ; 



10 



determining the mean residual time r y according to the 
following equation: 



rdG,(r), 



15 



wherein A is the sample interval. 
10. The method of claim 8, 

wherein the determining a plurality of process service 20 
time distributions further comprises determining that 
one of the process service time distributions is an 
exponential distribution with a service rate X; 

wherein the determining a quantity n cp/ - of seen processes ^ 
for each process service time distribution further com- 
prises determining a quantity n Cftf of seen processes 
which follow the exponential distribution with th e 
service rate "k; 

wherein the determining a mean residual time r y for each 
process service time distribution further comprises 
determining the mean residual time r - for the exponen- 
tial distribution with the service rate A. according to the 
following equation: 



35 



wherein A is the sample interval. 

11. The method of claim 8, 40 
wherein the determining a plurality of process service 
time distributions further comprises determining that 
one of the process service time distributions is a 
uniform distribution between zero and a constant C; 

c 45 

wherein the determining a quantity n cpj or seen processes 
for each process service time distribution further com- 
prises determining a quantity u cpj of seen processes 
which follow the uniform distribution between zero and 

C; . .50 

wherein each process has a beginning time, and wherein 

the set of sampled data points includes the beginning 
time for each seen process; 
wherein the determining a mean residual time i f for each 
process service time distribution further comprises 5J 
determining the mean residual time r y for the uniform 
distribution between zero and C according to the fol- 
lowing equation: 



min(C - /, A) 
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12, The method of claim 8, 

wherein the determining a plurality of process service 
time distributions further comprises determining that 
one of the process service time distributions is an 
unknown distribution; 

wherein the determining a quantity n^ of seen processes 
for each process service time distribution further com- 
prises determining a quantity n cpj of seen processes 
which follow the unknown distribution; 

wherein each process has a beginning time, and wherein 
the set of sampled data points includes the beginning 
time b f for each seen process; 

wherein the determining a mean residual time r ; - for each 
process service time distribution further comprises 
determining the mean residual time r y - for the unknown 
distribution according to the following equation: 



60 



wherein t is the average difference between the last 
sampling time and the beginning time for the seen 
processes which follow the uniform distribution 65 
between zero and C, wherein O^t^C, and wherein A is 
the sample interval. 



wherein n is a total quantity of processes, s (7 is the first 
sampling time for each process i, and b, is a beginning 
time for each process i, and wherein s l7 =0 for each 
unseen process and s w >0 for each seen process. 

13. The method of claim 8, 

wherein the determining a plurality of process service 
time distributions further comprises determining that 
one of the process service time distributions is an 
unknown distribution; 

wherein the determining a quantity n^j of seen processes 
for each process service time distribution further com- 
prises determining a quantity n cpi of seen processes 
which follow the unknown distribution; 

wherein each process has a beginning time, and wherein 
the set of sampled data points includes the beginning 
time b, for each seen process; 

wherein the determining a mean residual time r y for each 
process service time distribution further comprises 
determining the mean residual time r y for the unknown 
distribution according to the following equation: 

2>,-W 



wherein CP is a set of all seen processes which follow the 
unknown distribution and s u is the first sampling time 
for each seen process ieCP. 

14. The method of claim 1, further comprising: 

determining a total captured utilization U c , wherein the 
total captured utilization measures a total length of 
sampled segments for the one or more seen processes 
over the measurement interval; 

determining a total measured utilization U m , wherein the 
total measured utilization U m measures a total length of 
all of the one or more processes over the measurement 
interval; 
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wherein the statistically estimating a total unseen utiliza- 
tion U„ farther comprises determining the total unseen 
utilization according to the following equation: 

wherein V uc is the total uncaptured utilization. 
15. The method of claim 14, 

wherein each process has a beginning time, and wherein 

the set of sampled data points includes the beginning 

time b ; for each seen process; 
wherein the determining a total captured utilization U c 

further comprises determining U c according to the 

following equation: 



ieCP 



20 



wherein CP is a set of all seen processes, s^. is the last 
sampling time for each seen process ieCP, b ( - is the 
beginning time for each seen process ieCP, and Lis the 
measurement interval. 

16. The method of claim 1, 

wherein the statistically estimating a total unseen utiliza- 
tion U M further comprises: 
creating a plurality of buckets; 
placing each seen process into one of the plurality of 
buckets; 

estimating a total quantity of unseen processes for each 
of a plurality of equal length segments of the sample 
interval A, wherein each segment corresponds to a 
bucket. 

17. The method of claim 1, 

wherein the statistically estimating a total unseen utiliza- 
tion U w further comprises: 

creating a plurality of buckets with m rows and n 
columns, wherein n is a maximum number of 
samples in the set of sampled data points for any 
particular process, wherein m is a multiple of n, and 
wherein the buckets are ordered from zero to m-1; 

placing each seen process into one of the plurality of 
buckets, wherein the bucket is labeled according to 
the following equation: 



25 



30 



35 



40 



45 



50 



wherein t is a total quantity of samples in the set of 
sampled data points for this process, wherein i indi- 
cates one of 55 



equal divisions of the sample interval A such that 



60 



I = 0, 1 - - 1; 



estimating a total quantity of unseen processes for each 
of 



65 



length segments of the sample interval A, comprising: 

counting a total quantity f, of processes of the greatest 
length segment contained in the highest-numbered 
bucket which contains at least one process; 
multiplying f; by m; 

determining a fraction of mxf, which are unseen pro- 
cesses; 

iteratively estimating a total quantity of unseen pro- 
cesses for each lesser length segment of the sample 
interval A, comprising: 

counting an initial quantity of processes of the next 
lesser length segment contained in the next lower- 
numbered bucket; 

calculating a difference of the initial quantity and a 
fraction of previously calculated longer processes; 

calculating a product of the difference and m; 

determining a fraction of the product which are 
unseen processes. 

18. The method of claim 1, further comprising: 
determining a length distribution of the unseen processes 

of a greatest length, comprising multiplying the total 

unseen utilization by a first coefficient; 
detennining a length distribution of the unseen processes 

of a lesser length, comprising multiplying the total 

unseen utilization by a second coefficient; 
wherein the first coefficient and second coefficient are 

derived from an iterative method, wherein the iterative 

method comprises: 

creating a plurality of buckets; 

placing each seen process into one of the plurality of 
buckets; 

estimating a total quantity of unseen processes for each 
of a plurality of equal length segments of the sample 
interval A, wherein each length segment corresponds 
to a bucket. 

19. The method of claim 1, 

wherein the memory is a registry of metrics. 

20. The method of claim 1, 

wherein the collecting a set of raw data points, the storing 
the set of raw data points in a memory, and sampling 
the memory are performed continually and repetitively 
over the measurement interval. 

21. The method of claim 1, 

wherein the collecting a set of raw data points is per- 
formed a plurality of times at a collecting frequency; 
wherein the sampling the memory is performed a plurality 

of times at a sampling frequency; 
wherein the sampling frequency is less than the collecting 
frequency. 

22. The method of claim 1, 

wherein the collecting a set of raw data points, the storing 
the set of raw data points in a memory, the sampling the 
memory, the statistically estimating a total uncaptured 
utilization and the statistically estimating a total 
unseen utilization are performed on a single com- 
puter system. 

23. The method of claim 1, 

wherein the collecting a set of raw data points is per- 
formed on a different computer system than the statis- 
tically estimating a total uncaptured utilization V ue and 
the statistically estimating a total unseen utilization 
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24. The method of claim 1, further comprising: 
modifying a model of the computer system based on the 

statistically estimating a total uncaptured utilization 
V ac and the statistically estimating a total unseen 
utilization U^. 5 

25. The method of claim 1, further comprising: 
altering a configuration of the computer system based on 

the statistically estimating a total uncaptured utilization 
U uc and the statistically estimating a total unseen 
utilization U^,. 10 

26. A method for monitoring the state of a computer 
system, the method comprising: 

collecting a set of raw data points over a measurement 
interval L, wherein the set of raw data points relates to 
one or more processes on the computer system; 15 

storing the set of raw data points in a memory; 

sampling the memory repetitively at a sample interval A 
to create a set of sampled data points, wherein pro- 
cesses which are included in the set of sampled data 
points are seen processes and processes which are not 20 
included in the set of sampled data points are unseen 
processes, and wherein the set of sampled data points 
includes a first sampling time and a last sampling time 
for each seen process; 

statistically estimating a total uncaptured utilization U„ c , 25 
wherein the total uncaptured utilization is an estimation 
of a total length of unsampled segments for the seen 
processes of the one or more processes over the mea- 
surement interval, comprising: 

determining a plurality d of process service time 30 
distributions, wherein each process service time distri- 
bution j estimates a duration of one or more processes, 
wherein 1 ~j=d; 
for each process service time distribution j, determining a 
quantity n c/v - of seen processes which follow that pro- 35 
cess service time distribution j; 
for each process service time distribution j, determining a 
mean residual time r y for that process service time 
distribution j, wherein the mean residual time estimates 
a length of an uncaptured residual segment for each 40 
seen process which follows that process service time 
distribution j; 

determining the total uncaptured utilization U ue according 
to the following equation: 



Xi ***** 



45 



50 



statistically estimating a total unseen utilization 
wherein the total unseen utilization is an estimation of 
a total length of the unseen processes of the one or more 
processes over the measurement interval; 55 

modifying a model of the computer system based on the 
statistically estimating a total uncaptured utilization 
V ttC and the statistically estimating a total unseen 
utilization U^. 

27. A method for monitoring the state of a computer 60 
system, the method comprising: 

collecting a set of raw data points over a measurement 
interval L, wherein the set of raw data points relates to 
one or more processes on the computer system; 

storing the set of raw data points in a memory; 65 

sampling the memory repetitively at a sample interval A 
to create a set of sampled data points, wherein pro- 



cesses which are included in the set of sampled data 
points are seen processes and processes which are not 
included in the set of sampled data points are unseen 
processes, and wherein the set of sampled data points 
includes a first sampling time and a last sampling time 
for each seen process; 

statistically estimating a total uncaptured utilization U ttC , 
wherein the total uncaptured utilization is an estimation 
of a total length of unsampled segments for the seen 
processes over the measurement interval; 

statistically estimating a total unseen utilization U UJ , 
wherein the total unseen utilization is an estimation of 
a total length of the unseen processes over the mea- 
surement interval, comprising: 
creating a plurality of buckets; 
placing each seen process into one of the plurality of 
buckets; 

estimating a total quantity of unseen processes for each 
of a plurality of equal length segments of the sample 
interval A, wherein each segment corresponds to a 
bucket; 

modifying a model of the computer system based on the 
statistically estimating a total uncaptured utilization 
U uc and the statistically estimating a total unseen 
utilization U,^. 

28. A system for monitoring the state of a computer 
system, the system comprising: 

a CPU; 

a system memory coupled to the CPU, wherein the system 
memory stores one or more computer programs execut- 
able by the CPU; 
wherein the computer programs are executable to: 
collect a set of raw data points over a measurement 
interval L, wherein the set of raw data points relates to 
a set of processes on the computer system; 
store the set of raw data points in a memory; 
sample the memory repetitively at a sample interval A 
to create a set of sampled data points, wherein 
processes which are included in the set of sampled 
data points are seen processes and processes which 
are not included in the set of sampled data points are 
unseen processes, and wherein the set of sampled 
data points includes a first sampling time and a last 
sampling time for each seen process; 
statistically estimate a total uncaptured utilization U^, 
wherein the total uncaptured utilization is an estima- 
tion of a total length of unsampled segments for the 
seen processes over the measurement interval; 
statistically estimate a total unseen utilization U us , 
wherein the total unseen utilization is an estimation 
of a total length of the unseen processes over the 
measurement interval. 
29. The system of claim 28, 

wherein in statistically estimating the total uncaptured 
utilization U^, the computer programs are executable 
to: 

determine a process service time distribution, wherein 
the process service time distribution estimates a 
duration of one or more processes; 
determine a quantity n cp of seen processes which 

follow the process service time distribution; 
determine a mean residual time 7 for the process service 
time distribution, wherein the mean residual time 
estimates a length of an uncaptured residual segment 
for each seen process; 
determine the total uncaptured utilization U uc accord- 
ing to the following equation: 
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wherein L is the measurement interval. 
30. The system of claim 29, 

wherein in determining a mean residual time r, the com- 
puter programs are further executable to: 
determining a conditional probability function G/r) for 10 
the process service time distribution, wherein GJ[f) is 
a conditional probability that a residual time R^r, 
given that a process time X>t, wherein t is the last 
sampling time and r is a difference between the last 1S 
sampling time and a process ending time, and 
wherein GXO is determined according to the follow- 
ing equation: 



G t (r) = ftR*r)\X>t) = 



P(f < jc & 1 4- r) 



20 



determining the mean residual time r according to the 
following equation: 2 5 



rdG t {r) y 



wherein A is the sample interval. 
31. The system of claim 29, 

wherein in determining a process service time 
distribution, the computer programs are executable to 
determine that the process service time distribution is 35 
an exponential distribution with a service rate K; 

wherein in determining a quantity n cp of seen processes, 
the computer programs are executable to determine a 
quantity n cp of seen processes which follow the expo- ^ 
nential distribution with the service rate X; 

wherein in determining a mean residual time r, the com- 
puter programs are executable to determine the mean 
residual time r according to the following equation 



wherein A is the sample interval. 
32. The system of claim 29, 

wherein in determining a process service time 
distribution, the computer programs are executable to 
determine that the process service time distribution is a 
uniform distribution between zero and a constant C; 

wherein in determining a quantity n cp of seen processes, 
the computer programs are executable to determine a 
quantity n^ of seen processes which follow the uni- 
form distribution between zero and C; 

wherein each process has a beginning time, and wherein 
the set of sampled data points includes the beginning 
time for each seen process; 

wherein in determining a mean residual time r, the com- 
puter programs are executable to determine the mean 
residual time r according to the following equation: 



45 
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mir(C - /, A) 



wherein t is the average difference between the last 
sampling time and the beginning time for the seen 
processes which follow the uniform distribution 
between zero and C, wherein OitiC, and wherein A is 
the sample interval. 

33. The system of claim 29, 

wherein in determining a process service time 
distribution, the computer programs are executable to 
determine that the process service time distribution is 
an unknown distribution; 

wherein in determining a quantity n^ of seen processes, 
the computer programs are executable to determine a 
quantity n CJJ of seen processes which follow the 
unknown distribution; 

wherein each process has a beginning time, and wherein 
the set of sampled data points includes the beginning 
time b,- for each seen process; 

wherein in determining a mean residual time r, the com- 
puter programs are executable to determine the mean 
residual time r according to the following equation: 

n 

£ mra[0,ta/ -*i)] 
i=U 

n "> 

£ 1 



wherein n is a total quantity of processes, s^ is the first 
sampling time for each process i, and b £ is a beginning 
time for each process i. 

34. The system of claim 29, 

wherein in determining a process service time 
distribution, the computer programs are executable to 
determine that the process service time distribution is 
an unknown distribution; 

wherein in determining a quantity n cp of seen processes, 
the computer programs are executable to determine a 
quantity n cp of seen processes which follow the 
unknown distribution; 

wherein each process has a beginning time, and wherein 
the set of sampled data points includes the beginning 
time b, for each seen process; 

wherein in determining a mean residual time r, the com- 
puter programs are executable to determine the mean 
residual time r according to the following equation: 

. kCP 



wherein CP is a set of all seen processes which follow the 
unknown distribution and s u is the first sampling time 
for each seen process ieCP. 

35. The system of claim 28, 

wherein in statistically estimating a total uncaptured uti- 
lization U^, the computer programs are executable to: 
determine a set d of process service time distributions, 
wherein each process service time distribution j 
estimates a duration of one or more processes, 
wherein l=j=d; 
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for each process service time distribution j, determine 
a quantity u c ^ of seen processes which follow that 
process service time distribution j; 

for each process servicetime distribution j, determine 
a mean residual time r y for that process service time 
distribution j, wherein the mean residual time esti- 
mates a length of an uncaptured residual segment for 
each seen process which follows that process service 
time distribution j; 

determine the total uncaptured utilization V uc accord- 
ing to the following equation: 



10 



IS 
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determine that one of the process service time distri- 
butions is a uniform distribution between zero and a 
constant C; 

wherein in determining a quantity n - of seen processes 
for each process service time distribution, the computer 
programs are executable to determine a quantity n cpJ of 
seen processes which follow the uniform distribution 
between zero and C; 

wherein each process has a beginning time, and wherein 
the set of sampled data points includes the beginning 
time for each seen process; 

wherein in determining a mean residual time Tj for each 
process service time distribution, the computer pro- 
grams are executable to determine the mean residual 
time lj for the uniform distribution between zero and C 
according to the following equation: 



wherein L is the measurement interval. 
36. The system of claim 35, 

wherein in determining a mean residual time r 7 - for each 2 o 
process service time distribution, the computer pro- 
grams are executable to: 

determine a conditional probability function GJ(f) for 
each process service time distribution, wherein G^r) 
is a conditional probability that a residual time R^r, 25 
given that a process time X>t, wherein t is the last 
sampling time and r is a difference between the last 
sampling time and a process ending time, and 
wherein G^r) is determined according to the follow- 
ing equation: 



G,(r) = P(J?*r)|X>0 = 



P(t <x* r+-r) 



determine the mean residual time r f according to the fol- 
lowing equation: 



rdG t {r% 
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wherein A is the sample interval. 
37. The system of claim 35, 

wherein in determining a set of process service time 
distributions, the computer programs are executable to 45 
determine that one of the process service time distri- 
butions is an exponential distribution with a service rate 

K 

wherein in determining a quantity n cpj of seen processes 
for each process service time distribution, the computer 50 
programs are executable to determine a quantity n cpJ of 
seen processes which follow the exponential distribu- 
tion with the service rate X; 

wherein in determining a mean residual time x s for each 
process service time distribution, the computer pro- 55 
grams are executable to determine the mean residual 
time r y . for the exponential distribution with the service 
rate X according to the following equation: 



60 



wherein A is the sample interval. 

38. The system of claim 35, 65 
wherein in determining a set of process service time 
distributions, the computer programs are executable to 



mia(C — f, A) 



wherein t is the average difference between the last 
sampling time and the beginning time for the seen 
processes which follow the uniform distribution 
between zero and C, wherein Oit^C, and wherein A is 
the sample interval. 

39. The system of claim 35, 

wherein in determining a set of process service time 
distributions, the computer programs are executable to 
determine that one of the process service time distri- 
butions is an unknown distribution; 

wherein in determining a quantity n cpj of seen processes 
for each process service time distribution, the computer 
programs are executable to determine a quantity n cp j of 
seen processes which follow the unknown distribution; 

wherein each process has a beginning time, and wherein 
the set of sampled data points includes the beginning 
time b,- for each seen process; 

wherein in o^terrnining a mean residual time t s for each 
process service time distribution, the computer pro- 
grams arc executable to determine the mean residual 
time r f for the unknown distribution according to the 
following equation: 



£ max[0,(5 ( y-M 



Z l 

i=i, 

wherein n is a total quantity of processes, is the first 
sampling time for each process i, and b t - is a beginning 
time for each process i, and wherein s^O for each 
unseen process and s l7 >0 for each seen process. 

40. The system of claim 35, 

wherein in determining a set of process service time 
distributions, the computer programs are executable to 
determine that one of the process service time distri- 
butions is an unknown distribution; 

wherein in determining a quantity n cpf of seen processes 
for each process service time distribution, the computer 
programs are executable to determine a quantity n cp j of 
seen processes which follow the unknown distribution; 

wherein each process has a beginning time, and wherein 
the set of sampled data points includes the beginning 
time b £ for each seen process; 
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wherein in determining a mean residual time r,- for each 
process service time distribution, the computer pro- 
grams are executable to determine the mean residual 
time r f for the unknown distribution according to the 
following equation: 



10 



place each seen process into one of the plurality of 
buckets, wherein the bucket is labeled according to 
the following equation: 

m 

(r-l)-+/. 
#* 

■ n . 

wherein t is a total quantity of samples in the set of 
sampled data points for this process, wherein i indi- 
cates one of 



20 



25 



wherein CP is a set of all seen processes which follow the 
unknown distribution and s l7 is the first sampling time 
for each seen process ieCP. 

41. The system of claim 28, 

wherein the computer programs are further executable to: 
determine a total captured utilization U c , wherein the 
total captured utilization measures a total length of 
sampled segments for one or more seen processes of 
over the measurement interval; 
determine a total measured utilization U m , wherein the 
total measured utilization U m measures a total length 
of all of the one or more processes over the mea- 
surement interval; 
wherein in statistically estimating a total unseen utili- 
zation U^, the computer programs are executable to 
determine the total unseen utilization according 
to the following equation: 

U„-U m -U-U uc , 30 

wherein U MC is the total uncaptured utilization. 

42. The system of claim 41, 

wherein each process has a beginning time, and wherein 

the set of sampled data points includes the beginning 35 

time b ; for each seen process; 
wherein in detennining a total captured utilization U c , the 

computer programs are executable to determine U c 

according to the following equation: 



40 



wherein CP is a set of all seen processes, S^is the last 
sampling time for each seen process ieCP, b a - is the 
beginning time for each seen process ieCP, and L is the 
measurement interval. 

43. The system of claim 28, 

wherein in statistically estimating a total unseen utiliza- 
tion the computer programs arc executable to: 
create a plurality of buckets; 

place each seen process into one of the plurality of 
buckets; 

estimate a total quantity of unseen processes for each of 
a plurality of equal length segments of the sample 
interval A, wherein each segment corresponds to a 
bucket. 

44. The system of claim 28, 

wherein in statistically estimating a total unseen utiliza- 
tion the computer programs are executable to: 
create a plurality of buckets with m rows and n 

columns, wherein n is a maximum number of 
samples in the set of sampled data points for any 65 
particular process, wherein m is a multiple of n, and 
wherein the buckets are ordered from zero to m-1; 



15 equal divisions of the sample interval A such that 



i = 0, l - -1; 

n 

estimate a total quantity of unseen processes for each of 
m/n length segments of the sample interval A, 
wherein the computer programs are executable to: 
count a total quantity f, of processes of the greatest 
length segment contained in the highest-numbered 
bucket which contains at least one process; 
multiply f f - by m; 

determine a fraction of mxf t - which are unseen pro- 
cesses; 

iteratively estimate a total quantity of unseen pro- 
cesses for each lesser length segment of the 
sample interval A, wherein the computer programs 
are executable to: 

count an initial quantity of processes of the next 
lesser length segment contained in the next 
lower-numbered bucket; 

calculate a difference of the initial quantity and a 
fraction of previously calculated longer pro- 
cesses; 

calculate a product of the difference and m; 
determine a fraction of the product which are 
unseen processes. 

45. The system of claim 28, 

wherein the computer programs are further executable to: 
determine a length distribution of the unseen processes 
of a greatest length, comprising multiplying the total 
unseen utilization U,^ by a first coefficient; 
determine a length distribution of the unseen processes 
of a lesser length, comprising multiplying the total 
unseen utilization by a second coefficient; 
wherein the first coefficient and second coefficient are 
derived from an iterative method, wherein in per- 
forming the iterative method to determine the first 
coefficient and second coefficient, the computer pro- 
grams are executable to: 
create a plurality of buckets; 
place each seen process into one of the plurality of 
buckets; 

estimate a total quantity of unseen processes for each 
of a plurality of equal length segments of the 
sample interval A, wherein each length segment 
corresponds to a bucket. 

46. The system of claim 28, 

wherein the memory comprises a registry of metrics. 

47. The system of claim 28, 

wherein the computer programs are further executable to 
modify a model of the computer system based on the 
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statistically estimating a total uncaptured utilization 
U ttC and the statistically estimating a total unseen 
utilization U^. 

48. The system of claim 28, 

wherein the computer programs are further executable to 
alter a configuration of the computer system based on 
the statistically estimating a total uncaptured utilization 
U ac and the statistically estimating a total unseen 
utilization U^. 

49. A memory medium which stores program instructions 
for monitoring the state of a computer system, wherein the 
program instructions are executable to implement: 

collecting a set of raw data points over a measurement 
interval L, wherein the set of raw data points relates to is 
a one or more processes on the computer system; 

storing the set of raw data points in a memory; 

sampling the memory repetitively at a sample interval A 
to create a set of sampled data points, wherein pro- 2Q 
cesses which are included in the set of sampled data 
points are seen processes and processes which are not 
included in the set of sampled data points are unseen 
processes, and wherein the set of sampled data points 
includes a first sampling time and a last sampling time 25 
for each seen process; 

statistically estimating a total uncaptured utilization U HC , 
wherein the total uncaptured utilization is an estimation 
of a total length of unsampled segments for the seen 
processes over the measurement interval; 30 

statistically estimating a total unseen utilization 
wherein the total unseen utilization is an estimation of 
a total length of the unseen processes over the mea- 
surement interval. 35 

50. The memory medium of claim 49, 

wherein the statistically estimating a total uncaptured 

utilization U MC further comprises: 

determining a process service time distribution, 
wherein the process service time distribution esti- 40 
mates a duration of one or more processes; 

determining a quantity n cp of seen processes which 
follow the process service time distribution; 

determining a mean residual time r for the process ^ 
service time distribution, wherein the mean residual 
time estimates a length of an uncaptured residual 
segment for each seen process; and 

determining the total uncaptured utilization U uc accord- 
ing to the following equation: 50 



38 



f\x>t) ; 



determining the mean residual time r according to the 
following equation: 



■r 



wherein A is the sample interval. 

52. The memory medium of claim 50, 

wherein the determining a process service time distribu- 
tion comprises determining that the process service 
time distribution is an exponential distribution with a 
service rate 

wherein the determining a quantity n cp of seen processes 
comprises determining a quantity n cp of seen processes 
which follow the exponential distribution with the 
service rate X; 

wherein the determining a mean residual time r further 
comprises determining the mean residual time r accord- 
ing to the following equation: 



wherein A is the sample interval. 

53. The memory medium of claim 50, 

wherein the determining a process service time distribu- 
tion comprises determining that the process service 
time distribution is a uniform distribution between zero 
and a constant C; 

wherein the determining a quantity n^ of seen processes 
comprises determining a quantity n^ of seen processes 
which follow the uniform distribution between zero and 
C; 

wherein each process has a beginning time, and wherein 
the set of sampled data points includes the beginning 
time for each seen process; 

wherein the determining a mean residual time r further 
comprises determining the mean residual time r accord- 
ing to the following equation: 



mk(C - r. A) 
?= 2 ' 




wherein L is the measurement interval. ss 
51. The memory medium of claim 50, 
wherein the determining a mean residual time r further 

comprises: 

determining a conditional probability function Gfe) for 6Q 
the process service time distribution, wherein G,(r) is 
a conditional probability that a residual time R^r, 
given that a process time X>t, wherein t is the last 
sampling time and r is a difference between the last 
sampling time and a process ending time, and 55 
wherein G/r) is determined according to the follow- 
ing equation: 



wherein t is the average difference between the last 
sampling time and the beginning time for the seen 
processes which follow the uniform distribution 
between zero and C, wherein O^t^C, and wherein A is 
the sample interval. 

54. The memory medium of claim 50, 

wherein the determining a process service time distribu- 
tion comprises determining that the process service 
time distribution is an unknown distribution; 

wherein the determining a quantity n cp of seen processes 
comprises Determining a quantity n cp of seen processes 
which follow the unknown distribution; 

wherein each process has a beginning time, and wherein 
the set of sampled data points includes the beginning 
time b 4 - for each seen process; 
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wherein the determining a mean residual time r further 
comprises determining the mean residual time r accord- 
ing to the following equation: 

5 

f-i. 

V t7>° 

~~ n 

L 1 
;=i. 

Vi ;; >o 10 

wherein n is a total quantity of processes, is the first 
sampling time for each process i, and b f - is a beginning 
time for each process i. 15 

55. The memory medium of claim 50, 

wherein the determining a process service time distribu- 
tion comprises determining that the process service 
time distribution is an unknown distribution; 

wherein the determining a quantity n cp of seen processes 20 
comprises determining a quantity n cp of seen processes 
which follow the unknown distribution; 

wherein each process has a beginning time, and wherein 
the set of sampled data points includes the beginning 
time b ; for each seen process; 25 

wherein the determining a mean residual time r further 
comprises determining the mean residual time r accord- 
ing to the following equation: 



30 



wherein CP is a set of all seen processes which follow the 35 

unknown distribution and s f/ is the first sampling time 

for each seen process ieCP. 
56. The memory medium of claim 49, 
wherein the statistically estimating a total uncaptured 

utilization U wc further comprises: 40 

determining a set d of process service time 
distributions, wherein each process service time dis- 
tribution j estimates a duration of one or more 
processes, wherein l=j=d; 

for each process service time distribution j , determining 45 
a quantity n cpj of seen processes which follow that 
process service time distribution j; 

for each process service time distribution j , determining 
a mean residual time r y for that process service time 
distribution j, wherein the mean residual time esti- 50 
mates a length of an uncaptured residual segment for 
each seen process which follows that process service 
time distribution j; and 

determining the total uncaptured utilization V uc accord- 
ing to the following equation: 
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is a conditional probability that a residual time R^r, 
given that a process time X>t, wherein t is the last 
sampling time and r is a difference between the last 
sampling time and a process ending time, and 
wherein Gtf) is determined according to the follow- 
ing equation: 



C t {r) = P{R*r)\X>0 = 



P(/<jcs r + r) 
P(X>t) *' 



determining the mean residual time r y according to the 
following equation: 



■r 



rdG t [r\ 



wherein A is the sample interval. 

58. The memory medium of claim 56, 

wherein the determining a set of process service time 
distributions further comprises determining that one of 
the process service time distributions is an exponential 
distribution with a service rate X; 

wherein the determining a quantity n cpj of seen processes 
for each process service time distribution further com- 
prises determining a quantity n cpf of seen processes 
which follow the exponential distribution with the 
service rate K 

wherein the determining a mean residual time r y for each 
process service time distribution further comprises 
determining the mean residual time r- for the exponen- 
tial distribution with the service rate A. according to the 
following equation: 



wherein A is the sample interval. 

59. The memory medium of claim 56, 

wherein the determining a set of process service time 
distributions further comprises determining that one of 
the process service time distributions is a uniform 
distribution between zero and a constant C; 

wherein the determining a quantity n cpf of seen processes 
for each process service time distribution further com- 
prises determining a quantity n cp9 - of seen processes 
which follow the uniform distribution between zero and 
C; 

wherein each process has a beginning time, and wherein 
the set of sampled data points includes the beginning 
time for each seen process; 

wherein the determining a mean residual time r f - for each 
process service time distribution further comprises 
determining the mean residual time l f for the uniform 
distribution between zero and C according to the fol- 
lowing equation: 

min(C - 1, A) 



wherein L is the measurement interval. 

57. The memory medium of claim 56, 

wherein the determining a mean residual time r y for each 
process service time distribution further comprises: 
determining a conditional probability function G/r) for 
each process service time distribution, wherein Gj(x) 



wherein t is the average difference between the last 
sampling time and the beginning time for the seen 
65 processes which follow the uniform distribution 
between zero and C, wherein O^t^C, and wherein A is 
the sample interval. 
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60. The memory medium of claim 56, 

wherein the determining a set of process service time 
distributions further comprises determining that one of 
the process service time distributions is an unknown 
distribution; 

wherein the determining a quantity n cpj of seen processes 
for each process service time distribution further com- 
prises determining a quantity a cpj of seen processes 
which follow the unknown distribution; 

wherein each process has a beginning time, and wherein 
the set of sampled data points includes the beginning 
time b,- for each seen process; 

wherein the determining a mean residual time tj for each 
process service time distribution further comprises 
determining the mean residual time r,- for the unknown 
distribution according to the following equation: 



10 



15 
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wherein n is a total quantity of processes, s £1 is the first 
sampling time for each process i, and b t - is a beginning 
time for each process i, and wherein s f7 =0 for each 
unseen process and s a >0 for each seen process. 

61. The memory medium of claim 56, 

wherein the determining a set of process service time 
distributions further comprises determining that one of 
the process service time distributions is an unknown 
distribution; 

wherein the determining a quantity n^ of seen processes 
for each process service time distribution further com- 
prises determining a quantity n cpi of seen processes 
which follow the unknown distribution; 

wherein each process has a beginning time, and wherein 
the set of sampled data points includes the beginning 
time b, for each seen process; 

wherein the determining a mean residual time r y . for each 
process service time distribution further comprises 
determining the mean residual time r, for the unknown 
distribution according to the following equation: 
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wherein CP is a set of all seen processes which follow the 
unknown distribution and s^ is the first sampling time 
for each seen process ieCR 
62. The memory medium of claim 49, 
wherein the program instructions further implement: 
determining a total captured utilization U c , wherein the 
total captured utilization measures a total length of 
sampled segments for the one or more seen processes 
over the measurement interval; and 
determining a total measured utilization U m , wherein 
the total measured utilization measures a total length 
of all of the one or more processes over the mea- 
surement interval; 
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wherein the statistically estimating a total unseen uti- 
lization U w further comprises determining the total 
unseen utilization V as according to the following 
equation: 

u^u m -u c u mc , 

wherein V uc is the total uncaptured utilization. 
63. The memory medium of claim 62, 
wherein each process has a beginning time, and wherein 

the set of sampled data points includes the beginning 

time b f - for each seen process; 
wherein the deterrnining a total captured utilization U,. 

further comprises determining U c according to the 

following equation: 



2 Csa-i -W 



V c = ° 



wherein CP is a set of all seen processes, s^ is the last 
sampling time for each seen process ieCP, b ( - is the 
beginning time for each seen process ieCP, and L is the 
25 measurement interval. 

64. The memory medium of claim 49, 
wherein the statistically estimating a total unseen utiliza- 
tion further comprises: 
creating a plurality of buckets; 
placing each seen process into one of the plurality of 

buckets; and 

estimating a total quantity of unseen processes for each 
of a plurality of equal length segments of the sample 
interval A, wherein each segment corresponds to a 
35 bucket. 

65. The memory medium of claim 49, 

wherein the statistically estimating a total unseen utiliza- 
tion U us further comprises: 

creating a plurality of buckets with m rows and n 
40 columns, wherein n is a maximum number of 

samples in the set of sampled data points for any 
particular process, wherein m is a multiple of n, and 
wherein the buckets are ordered from zero to m-1; 
placing each seen process into one of the plurality of 
buckets, wherein the bucket is labeled according to 
the following equation: 

m 

(/-!)-+/. 
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wherein t is a total quantity of samples in the set of 
sampled data points for this process, wherein i 
indicates one of 



equal divisions of the sample interval A such that 



» 0, l. • 



, - - 1; 
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and 



estimating a total quantity of unseen processes for 
each of 



02/13/2004, EAST Version: 1.4.1 



US 6,691,067 Bl 



43 



44 



10 



IS 



length segments of the sample interval A, comprising: 

counting a total quantity f t of processes of the 
greatest length segment contained in the 
highest-numbered bucket which contains at 
least one process; 
multiplying f, by m; 

determining a fraction of mxf f - which are unseen 
processes; and 

iteratively estimating a total quantity of unseen 
processes for each lesser length segment of the 
sample interval A, comprising: 
counting an initial quantity of processes of the 
next lesser length segment contained in the 
next lower-numbered bucket; 
calculating a difference of the initial quantity 2Q 
and a fraction of previously calculated longer 
processes; 

calculating a product of the difference and m; 
and 

determining a fraction of the product which are 2$ 
unseen processes. 

66. The memory medium of claim 49, 
wherein the program instructions further implement: 

determining a length distribution of the unseen pro- 
cesses of a greatest length, comprising multiplying 30 
the total unseen utilization U,^ by a first coefficient; 
and 

determining a length distribution of the unseen pro- 
cesses of a lesser length, comprising multiplying the 
total unseen utilization by a second coefficient; 35 

wherein the first coefficient and second coefficient are 
derived from an iterative method, wherein the itera- 
tive method comprises: 
creating a plurality of buckets; 

placing each seen process into one of the plurality of 40 
buckets; and 

estimating a total quantity of unseen processes for 
each of a plurality of equal length segments of the 
sample interval A, wherein each length segment 
corresponds to a bucket. 4s 

67. The memory medium of claim 49, 
wherein the memory comprises a registry of metrics. 

68. The memory medium of claim 49, 
wherein the collecting a set of raw data points, the storing 

the set of raw data points in a memory, and sampling 50 
the memory are performed continually and repetitively 
over the measurement interval. 

69. The memory medium of claim 49, 
wherein the collecting a set of raw data points is per- 
formed a plurality of times at a collecting frequency; 

wherein the sampling the memory is performed a plurality 

of times at a sampling frequency; 
wherein the sampling frequency is less than the collecting 

frequency. 60 

70. The memory medium of claim 49, 

wherein the collecting a set of raw data points, the storing 
the set of raw data points in a memory, the sampling the 
memory, the statistically estimating a total uncaptured 
utilization and the statistically estimating a total 65 
unseen utilization are performed on a single com- 
puter system. 



71. The memory medium of claim 49, 

wherein the collecting a set of raw data points is per- 
formed on a different computer system than the statis- 
tically estimating a total uncaptured utilization U uc and 
the statistically estimating a total unseen utilization 

72. The memory medium of claim 49, 

wherein the programs instructions further implement 
modifying a model of the computer system based on 
the statistically estimating a total uncaptured utilization 
U„ c and the statistically estimating a total unseen 
utilization V us . 

73. The memory medium of claim 49, 

wherein the programs instructions further implement 
altering a configuration of the computer system based 
on the statistically estimating a total uncaptured utili- 
zation V ac and the statistically estimating a total unseen 
utilization U M ,. 

74. A memory medium which stores program instructions 
for monitoring the state of a computer system, wherein the 
program instructions are executable to implement: 

collecting a set of raw data points over a measurement 
interval L, wherein the set of raw data points relates to 
one or more processes on the computer system; 
storing the set of raw data points in a memory; 
sampling the memory repetitively at a sample interval A 
to create a set of sampled data points, wherein pro- 
cesses which are included in the set of sampled data 
points are seen processes and processes which are not 
included in the set of sampled data points are unseen 
processes, and wherein the set of sampled data points 
includes a first sampling time and a last sampling time 
for each seen process; 
statistically estimating a total uncaptured utilization U„ c , 
wherein the total uncaptured utilization is an estimation 
of a total length of unsampled segments for the seen 
processes of the one or more processes over the mea- 
surement interval, comprising: 
determining a set d of process service time 
distributions, wherein each process service time dis- 
tribution j estimates a duration of one or more 
processes, wherein l=ij^d; 
for each process service time distribution j, determining 
a quantity n cpi - of seen processes which follow that 
process service time distribution j; 
for each process service time distribution j, determining 
a mean residual time r f for that process service time 
distribution j, wherein the mean residual time esti- 
mates a length of an uncaptured residual segment for 
each seen process which follows that process service 
time distribution j; and 
determining the total uncaptured utilization U MC accord- 
ing to the following equation: 

d 



statistically estimating a total unseen utilization U HS7 
wherein the total unseen utilization is an estimation 
of a total length of the unseen processes of the one 
or more processes over the measurement interval; 
and 
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modifying a model of the computer system based on 
the statistically estimating a total uncaptured utiliza- 
tion U uc and the statistically estimating a total unseen 
utilization U,^. 
75. A memory medium which stores program instructions 
for monitoring the state of a computer system, wherein the 
program instructions are executable to implement: 

collecting a set of raw data points over a measurement 
interval L, wherein the set of raw data points relates to 
. one or more processes on the computer system; 
storing the set of raw data points in a memory; 
sampling the memory repetitively at a sample interval A 
to create a set of sampled data points, wherein pro- 
cesses which are included in the set of sampled data 
points are seen processes and processes which are not 
included in the set of sampled data points are unseen 
processes, and wherein the set of sampled data points 
includes a first sampling time and a last sampling time 
for each seen process; 
statistically estimating a total uncaptured utilization U^, 
wherein the total uncaptured utilization is an estimation 
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of a total length of unsampled segments for the seen 
processes over the measurement interval; and 

statistically estimating a total unseen utilization V us9 
wherein the total unseen utilization is an estimation of 
a total length of the unseen processes over the mea- 
surement interval, comprising: 
creating a plurality of buckets; 

placing each seen process into one of the plurality of 
buckets; 

estimating a total quantity of unseen processes for each 
of a plurality of equal length segments of the sample 
interval A, wherein each segment corresponds to a 
bucket; and 

modifying a model of the computer system based on 
the statistically estimating a total uncaptured utiliza- 
tion U MC and the statistically estimating a total unseen 
utilization LL,. 
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